Closed GoogleCodeExporter closed 9 years ago
Screenshot http://awesomescreenshot.com/0b22ju666 of what step 4 looks like on
my Windows PC with defaults selected when creating new project file from
attached .TSV file.
Original comment by thadguidry
on 19 Oct 2010 at 2:21
If, however, I change the encoding of the attached file to UTF-8 using
Notepad++ or similar text tool, and save and then create a new project from
that newly encoded file, Refine does seem to detect UTF-8 completely and
correctly display the diacritic characters for row 33.
Hmm, perhaps the fault lies at the beginning with the Export function for TSV
in Refine? Does the export use UTF-8 or default to ASCII instead ? If either,
should it ask you which encoding you want to export as ?
Original comment by thadguidry
on 19 Oct 2010 at 2:29
Note that you can use `return value.decode('utf-8')` using jython instead of
GEL, and the values will be processed correctly.
Original comment by jayl...@gmail.com
on 11 Nov 2010 at 1:26
This is a duplicate of issue 237. The character encoding guesser was leaving
the project encoding unset if it got a confidence value below the threshold.
The reason it's guessing wrong is that it only looks at the first 4k (approx.)
of the file and your first non-ASCII characters are beyond that boundary. I
investigating increasing the lookahead, but it appears that only the first 3881
bytes are available at the time the guessing is done. Changing this would
require restructuring things.
Original comment by tfmorris
on 27 Nov 2010 at 12:38
Original issue reported on code.google.com by
thadguidry
on 19 Oct 2010 at 2:17Attachments: