klmr92 / uguu

Automatically exported from code.google.com/p/uguu
Other
0 stars 1 forks source link

Conversion of a non-unicode characters breaks unicode characters. #62

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
It happened that unicodize broke non-ascii (but correct utf8) characters at
the beginning of the path while trying to convert only the last path item.

Original issue reported on code.google.com by ruslan.savchenko on 19 May 2010 at 6:36

GoogleCodeExporter commented 9 years ago
simple solution is to replace 
  line = unicodize_line(line)
with
  line = unicode(line, scanners_locale, errors='replace')

But the latter replaces invalid character to a certain �, so this fix depends 
on issue 61

Original comment by ruslan.savchenko on 19 May 2010 at 7:03

GoogleCodeExporter commented 9 years ago
> line = unicode(line, scanners_locale, errors='replace')
This will not help: slashes could be masked by invalid utf8 sequences. It's 
better to
recode invalid utf8 chars in dt.

Original comment by radist...@gmail.com on 20 May 2010 at 8:25

GoogleCodeExporter commented 9 years ago
disowned

Original comment by ruslan.savchenko on 25 May 2010 at 6:08