Open GoogleCodeExporter opened 9 years ago
Joao, your fix for this issue in python3 (r384, line 2182) will also work in
python2 for Windows?
Original comment by aureliojargas@gmail.com
on 9 Nov 2010 at 8:36
People say "lightning doesn’t strike twice in the same place"...
I don't think **this** error was produced by what I said it was. It's possible
to reproduce only using --gui option and selecting the file by clicking the
'select' button.
My Windows username is 'João Bernardo' (yes, it has space and tilde and it's
good only to provide errors). Seriously! I've sent lots of bug reports to
different programs because of that.
So, the error happens JUST because the script prints the command line used!!
<!-- cmdline: txt2tags -t html C:/Users/João Bernardo/Desktop/a/sample.t2t -->
By doing
contents = [i.encode('utf8') for i in contents] #this line is better than the
previously written
we encode my username with UTF-8 and everything works (since ***ALL*** text is
encoded in utf-8)
------------------
The greatest feature of Python 3 is the use of Unicode Strings so it's not
affected by this problem. But....
Using the 'encode' method, we lost information and accents don't become
possible. And, at the same time, you can't write unicodes string in binary
files.
------------------
So what sould we do??
Python 2 -> the patch proposed will only work if the t2t file is encoded in
utf-8.... But, knowing the problem is with tkinter, you can do:
newfile = askopenfilename(filetypes=ftypes).encode('utf-8') #AT LINE ~5658
might solve the problem.... (haven't tested yet!!)
Python 3 -> Just doing "f = open(file_path, 'w')" is ok
Original comment by jbv...@gmail.com
on 9 Nov 2010 at 10:29
ops... askopenfilename() returns a file and not it's name. :(
So, the idea is to find where is the name of the file and encode it.
Original comment by jbv...@gmail.com
on 9 Nov 2010 at 10:48
Could this be fixed with python's ``str.decode``? With it we could take the
declared encoding (or maybe even a separate ``--src-encoding``) and decode it
into Unicode without losing accent information. Then afterward we could
``encode`` the text back into the user's declared encoding before writing the
file. Would that work without breaking anything?
Original comment by jamisee...@gmail.com
on 17 Nov 2010 at 7:59
I'm not having much time right now, but it seems easy to be fixed.
"str.decode" probably won't work.
Doing -> u'ãáàä'.decode() <- raises an exception
After doing all file handling (things that may access dirs with accented names)
the text should be encoded to {whatever txt2tags will be using} before
appending to the list used in SaveFile().
The list generated (e.g. html) is something like that:
[ '<html>', '<head>', ... , u'<!-- cmdline: txt2tags -t html
/folder/with/unicode/aãáä/file -->', '</body></html>' ]
>> The last but one item is an unicode string!
This is **not** a Windows-only related problem. I tried Debian (w/ Python 2.5)
and got the same message using a file in my folder "/home/jb/joão/" (attached
image)
That means Tkinter also gets file name in unicode on Linux (probably other *nix
platforms too).
Original comment by jbv...@gmail.com
on 17 Nov 2010 at 11:58
Attachments:
Comment from Jason Seeley in txt2tags-dev
http://groups.google.com/group/txt2tags-dev/msg/9ad4b233d0061671
> "str.decode" probably won't work.
> Doing -> u'ãáàä'.decode() <- raises an exception
No, decode is used on an encoded string and (if passed the correct
encoding as an argument) returns a unicode string. It's the opposite
direction as encode.
so (assuming a utf-8 encoding for the text):
>>> 'ãáàä'.decode('utf-8') == u'ãáàä'
True
>>> u'ãáàä'.encode('utf-8') == 'ãáàä'
True
The thought being you could write your document in your normal
encoding (which is most likely utf-8, but could just as easily be
latin1 or Shift-JIS or something else entirely). Txt2tags could use
decode as above to convert that into unicode strings, so that all
internal operations work as expected, then re-encode to the proper
encoding afterward without losing special characters or accents as
long as the target encoding supports them.
Original comment by aureliojargas@gmail.com
on 18 Nov 2010 at 2:05
Original issue reported on code.google.com by
jbv...@gmail.com
on 5 Nov 2010 at 4:32