princevil / google-refine

Automatically exported from code.google.com/p/google-refine
0 stars 0 forks source link

inconsistencies in encoding guessing during load #172

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
I have attached two short files, simple ASCII, not unicode, only two rows each.
They differ only for a character, but one is correctly identified as 
ISO-8859-1, the other as UTF-32LE which produces question marks as a result.

console output for the bad one
19:57:12.877 [   create-project_command] Importing 'bad guess.tsv' (7428ms)
19:57:12.879 [   create-project_command] Best encoding guess: UTF-32LE 
[confidence: 25] (2ms)

console output for the good one
19:57:28.071 [   create-project_command] Importing 'good guess.tsv' (15192ms)
19:57:28.074 [   create-project_command] Best encoding guess: ISO-8859-1 
[confidence: 30] (3ms)

What version of the product are you using? On what operating system?
Refine-1.1-r878

Please provide any additional information below.
Windows 7/764 - Italian

Original issue reported on code.google.com by runuppat...@gmail.com on 29 Oct 2010 at 6:15

Attachments:

GoogleCodeExporter commented 8 years ago
This was a known Issue.  I have verified as fixed now for the attached files.  
Please download and use Version 2.0.  Closing as Verified.

Original comment by thadguidry on 19 Nov 2010 at 8:35

GoogleCodeExporter commented 8 years ago

Original comment by tfmorris on 18 Sep 2012 at 3:01