vinitkumar / googlecl

GoogleCL rewrite in __progress__
Other
324 stars 48 forks source link

UnicodeDecodeError: 'utf8' codec / 'ascii' codec can't decode byte(s) #203

Closed vinitkumar closed 10 years ago

vinitkumar commented 10 years ago

From spv60582 on June 30, 2010 11:49:08

What steps will reproduce the problem?

  1. "Python.exe google youtube post -n "текст заголовка" -s "текст описания" -t "tag1" -c News file.mp4"

What is the expected output? What do you see instead?

I tried to use non-latin characters with Youtube post task in Windows XP.

Instead of this I see output error:

Loading file.mp4 Traceback (most recent call last): File "google", line 463, in main() File "google", line 457, in main run_once(options, args) File "google", line 356, in run_once task.run(client, options, args) File "C:\Python26\lib\site-packages\googlecl\youtube\service.py", line 217, in _run_post tags=options.tags, category=options.category) File "C:\Python26\lib\site-packages\googlecl\youtube\service.py", line 129, in post_videos self.InsertVideoEntry(video_entry, path) File "C:\Python26\lib\site-packages\gdata\youtube\service.py", line 654, in InsertVideoEntry converter=gdata.youtube.YouTubeVideoEntryFromString) File "C:\Python26\lib\site-packages\gdata\service.py", line 1236, in Post media_source=media_source, converter=converter) File "C:\Python26\lib\site-packages\gdata\service.py", line 1286, in PostOrPut data_str = str(data) File "C:\Python26\lib\site-packages\atominit.py", line 377, in str return self.ToString() File "C:\Python26\lib\site-packages\atominit.py", line 374, in ToString return ElementTree.tostring(self._ToElementTree(), encoding=string_encoding) File "C:\Python26\lib\site-packages\atominit.py", line 369, in _ToElementTree self._AddMembersToElementTree(new_tree) File "C:\Python26\lib\site-packages\atominit.py", line 331, in _AddMembersToElementTree member._BecomeChildElement(tree) File "C:\Python26\lib\site-packages\atominit.py", line 357, in _BecomeChildElement self._AddMembersToElementTree(new_child) File "C:\Python26\lib\site-packages\atominit.py", line 331, in _AddMembersToElementTree member._BecomeChildElement(tree) File "C:\Python26\lib\site-packages\atominit.py", line 357, in _BecomeChildElement self._AddMembersToElementTree(new_child) File "C:\Python26\lib\site-packages\atominit.py", line 342, in _AddMembersToElementTree ExtensionContainer._AddMembersToElementTree(self, tree) File "C:\Python26\lib\site-packages\atominit.py", line 224, in _AddMembersToElementTree tree.text = self.text.decode(MEMBER_STRING_ENCODING) File "C:\Python26\lib\encodings\utf_8.py", line 16, in decode return codecs.utf_8_decode(input, errors, True) UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-3: invalid data

What version of the product are you using? On what operating system? What version of gdata-python-client (aka python-gdata)?

googlecl-0.9.7.tar.gz python-2.6.5.msi gdata-2.0.10.zip Windows XP SP3

Original issue: http://code.google.com/p/googlecl/issues/detail?id=195

vinitkumar commented 10 years ago

From tom.h.mi...@gmail.com on July 23, 2010 19:06:42

Issue 188 has been merged into this issue.

vinitkumar commented 10 years ago

From tom.h.mi...@gmail.com on July 23, 2010 19:11:11

You mentioned that this worked in Cygwin in Issue 188 , so the issue seems to be tied to the shell. I'm not sure how to solve this, but I don't think additional code in GoogleCL can help.

I'll leave this issue here as open in case someone has the same problem and figures out how to fix it.

vinitkumar commented 10 years ago

From sascha.g...@gmail.com on July 26, 2010 06:00:33

Same problem in my configuration ...

C:\Dokumente und Einstellungen\Sascha.Gibson\Eigene Dateien\Eigene Bilder>google picasa post -n "testest" Lesekoenig.jpg Loading file Lesekoenig.jpg to album testest

C:\Dokumente und Einstellungen\Sascha.Gibson\Eigene Dateien\Eigene Bilder>google picasa post -n "testest" Lesekönig.jpg Loading file Lesek÷nig.jpg to album testest Traceback (most recent call last): File "google", line 536, in File "google", line 530, in main File "google", line 408, in run_once File "googlecl\picasa\service.pyo", line 333, in _run_post File "googlecl\picasa\service.pyo", line 206, in insert_photo_list File "gdata\photos\service.pyo", line 469, in InsertPhotoSimple File "gdata\photos\service.pyo", line 425, in InsertPhoto File "gdata\service.pyo", line 1236, in Post File "gdata\service.pyo", line 1286, in PostOrPut File "atominit.pyo", line 377, in str File "atominit.pyo", line 374, in ToString File "atominit.pyo", line 369, in _ToElementTree File "atominit.pyo", line 331, in _AddMembersToElementTree File "atominit.pyo", line 357, in _BecomeChildElement File "atominit.pyo", line 342, in _AddMembersToElementTree File "atominit.pyo", line 224, in _AddMembersToElementTree File "encodings\utf_8.pyo", line 16, in decode UnicodeDecodeError: 'utf8' codec can't decode bytes in position 5-8: invalid dat a

C:\Dokumente und Einstellungen\Sascha.Gibson\Eigene Dateien\Eigene Bilder>

vinitkumar commented 10 years ago

From tom.h.mi...@gmail.com on September 01, 2010 12:16:37

Issue 272 has been merged into this issue.

vinitkumar commented 10 years ago

From tom.h.mi...@gmail.com on September 01, 2010 13:01:03

I have a hunch. Try applying the attached patch, and let me know if the error disappears/changes.

The patch is (theoretically) decoding everything you enter, which (should) allow the atom module to decode with utf-8. I don't know if this is safe, so I'm not sure if this patch will make it into the trunk or not.

I am 99.99% sure that if you use a unicode-friendly shell / terminal program (try cygwin for windows), this problem will go away without using the patch.

Please report back here with details on how the patch or another shell worked. Or, for mega brownie points, try both. Thanks a lot!

Status: Feedback

Attachment: unicode_input.patch

vinitkumar commented 10 years ago

From tom.h.mi...@gmail.com on September 01, 2010 13:01:56

FYI, this patch should apply successfully to 0.9.9 and the version in the trunk. Not sure about <=0.9.8

vinitkumar commented 10 years ago

From neur...@gmail.com on September 01, 2010 14:40:30

I applied the patch and now it takes unicode data with urxvt.

A question slightly off topic: in italian sentences like "tomorrow at noon" or "monday at 9 pm" do not work (the event is created now) nor the local translation "domani a mezzogiorno", "lunedì alle 21" do work. Where can I find a reference for recognized words?

vinitkumar commented 10 years ago

From neur...@gmail.com on September 02, 2010 01:09:58

Now, after decoding user input you should encode output...

sys.stdout.encoding 'ISO-8859-15'

Thank you

vinitkumar commented 10 years ago

From tom.h.mi...@gmail.com on September 02, 2010 06:28:50

The problem with adding events in Italian seems to be rooted in the calendar service itself. See Issue 211 . I'm not sure if there are any resources on quick-add (which is what GoogleCL uses) in a non-english language.

I've asked the python mailing list if it's safe to blanket-decode command line arguments. The more I think about it, the safer it seems. But yes, encoding output is the next step for this issue.

Thanks for reporting back!

vinitkumar commented 10 years ago

From neur...@gmail.com on September 02, 2010 06:40:25

Ok, I've read Issue 211 and really seems quick add is disabled if Calendar language is not English: I restored it and now, for example, "lunch with tony saturday at 1 pm" (or "at 13") works.

Thank you and keep up the good work!

vinitkumar commented 10 years ago

From tom.h.mi...@gmail.com on September 03, 2010 14:33:42

Non-latin input and output should be working in 0.9.10. Report back on this thread if you find any UnicodeEncodeErrors or UnicodeDecodeErrors.

Status: Fixed

vinitkumar commented 10 years ago

From robin.ru...@gmail.com on September 08, 2010 06:07:31

I still have problems with v0.9.10. I had the non-latin character ß in a appointment today and because of this "$google calendar today" fails with this output:

Traceback (most recent call last): File "/usr/bin/google", line 676, in main() File "/usr/bin/google", line 662, in main run_once(options, args) File "/usr/bin/google", line 504, in run_once task.run(client, options, args) File "/usr/lib/pymodules/python2.6/googlecl/calendar/service.py", line 495, in _run_list_today _list(client, options, args, date) File "/usr/lib/pymodules/python2.6/googlecl/calendar/service.py", line 470, in _list delimiter=options.delimiter) File "/usr/lib/pymodules/python2.6/googlecl/base.py", line 603, in compile_entry_string return_string += val.replace(delimiter, ' ') + delimiter UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 2: ordinal not in range(128)

vinitkumar commented 10 years ago

From tom.h.mi...@gmail.com on September 08, 2010 06:20:23

Can you try the patch in Issue 279 ? That should clear up the issue.

Summary: UnicodeDecodeError: 'utf8' codec / 'ascii' codec can't decode byte(s)
Status: Accepted

vinitkumar commented 10 years ago

From tom.h.mi...@gmail.com on September 08, 2010 06:20:50

Issue 279 has been merged into this issue.

vinitkumar commented 10 years ago

From tom.h.mi...@gmail.com on September 08, 2010 06:33:36

Everyone: the patch in Issue 279 , applied to 0.9.10, should solve most (hopefully all) of these decode / encode errors. But if not, let me know through this issue.

vinitkumar commented 10 years ago

From jeremy.c...@gmail.com on September 11, 2010 07:32:03

I've installed the 0.9.10 release on my ubuntu 10.04 and I'm still having encoding/decoding problems. Note: My contacts names contained accentuated characters.

Traceback (most recent call last): File "/usr/bin/google", line 681, in main() File "/usr/bin/google", line 667, in main run_once(options, args) File "/usr/bin/google", line 509, in run_once task.run(client, options, args) File "/usr/lib/pymodules/python2.6/googlecl/contacts/base.py", line 203, in _run_list delimiter=options.delimiter) File "/usr/lib/pymodules/python2.6/googlecl/base.py", line 603, in compile_entry_string return_string += val.replace(delimiter, ' ') + delimiter UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128)

I've applied the provided patch but without success.

vinitkumar commented 10 years ago

From ludovic....@gmail.com on September 12, 2010 11:30:20

I also have the same problem.

$ google calendar list

[ludovic.rousseau@gmail.com] Traceback (most recent call last): File "/usr/local/bin/google", line 5, in pkg_resources.run_script('googlecl==0.9.10', 'google') File "/System/Library/Frameworks/Python.framework/Versions/2.6/Extras/lib/python/pkg_resources.py", line 442, in run_script self.require(requires)[0].run_script(script_name, ns) File "/System/Library/Frameworks/Python.framework/Versions/2.6/Extras/lib/python/pkg_resources.py", line 1167, in run_script exec script_code in namespace, namespace File "/Library/Python/2.6/site-packages/googlecl-0.9.10-py2.6.egg/EGG-INFO/scripts/google", line 676, in

File "/Library/Python/2.6/site-packages/googlecl-0.9.10-py2.6.egg/EGG-INFO/scripts/google", line 662, in main

File "/Library/Python/2.6/site-packages/googlecl-0.9.10-py2.6.egg/EGG-INFO/scripts/google", line 504, in run_once

File "build/bdist.macosx-10.6-universal/egg/googlecl/calendar/service.py", line 490, in _run_list File "build/bdist.macosx-10.6-universal/egg/googlecl/calendar/service.py", line 470, in _list File "build/bdist.macosx-10.6-universal/egg/googlecl/base.py", line 603, in compile_entry_string UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128)

I do not have the problem with googlecl-0.9.9. So it is a regression in 0.9.10 for me.

The bug is triggered by an event named "Férié" so using non-ASCII characters.

vinitkumar commented 10 years ago

From tom.h.mi...@gmail.com on September 12, 2010 12:11:48

Jeremy and Ludovic, you've definitely, successfully, applied the patch in Issue 279 ? Because there should be absolutely no mention of the ascii codec if the patch was applied.

vinitkumar commented 10 years ago

From tom.h.mi...@gmail.com on September 28, 2010 13:34:08

Issue 298 has been merged into this issue.

vinitkumar commented 10 years ago

From thmil...@google.com on October 04, 2010 12:46:25

Issue 305 has been merged into this issue.

vinitkumar commented 10 years ago

From thmil...@google.com on October 09, 2010 13:18:00

Alright, 0.9.11 should fix this issue for good (meaning, only a few strange edge cases left to cover). Let me know if problems keep cropping up even after the upgrade.

Status: Fixed

vinitkumar commented 10 years ago

From tais.hansen on October 19, 2010 05:01:15

I'm getting a UnicodeDecodeError with 0.9.11 on Ubuntu 10.04.1 LTS attempting to upload a file to Docs:

$ google docs upload --no-convert --folder "Netværk" pdf/contex-allerød.pdf No supported filetype found for extension pdf Uploading as text/plain Loading pdf/contex-allerød.pdf Traceback (most recent call last): File "/usr/bin/google", line 812, in main() File "/usr/bin/google", line 798, in main run_once(options, args) File "/usr/bin/google", line 577, in run_once task.run(client, options, args) File "/usr/lib/pymodules/python2.6/googlecl/docs/base.py", line 494, in _run_upload file_ext=options.format, convert=options.convert) File "/usr/lib/pymodules/python2.6/googlecl/docs/base.py", line 299, in upload_docs **kwargs) File "/usr/lib/pymodules/python2.6/googlecl/docs/service.py", line 345, in upload_single_doc converter=gdata.docs.DocumentListEntryFromString) File "/usr/lib/pymodules/python2.6/gdata/service.py", line 1146, in Post media_source=media_source, converter=converter) File "/usr/lib/pymodules/python2.6/gdata/service.py", line 1214, in PostOrPut multipart[2]], headers=extra_headers) File "/usr/lib/pymodules/python2.6/atom/service.py", line 175, in request data=data, headers=all_headers) File "/usr/lib/pymodules/python2.6/gdata/auth.py", line 845, in perform_request return http_client.request(operation, url, data=data, headers=headers) File "/usr/lib/pymodules/python2.6/atom/http.py", line 135, in request connection.endheaders() File "/usr/lib/python2.6/httplib.py", line 904, in endheaders self._send_output() File "/usr/lib/python2.6/httplib.py", line 776, in _send_output self.send(msg) File "/usr/lib/python2.6/httplib.py", line 755, in send self.sock.sendall(str) File "/usr/lib/python2.6/ssl.py", line 203, in sendall v = self.send(data[count:]) File "/usr/lib/python2.6/ssl.py", line 94, in self.send = lambda data, flags=0: SSLSocket.send(self, data, flags) File "/usr/lib/python2.6/ssl.py", line 174, in send v = self._sslobj.write(data) UnicodeEncodeError: 'ascii' codec can't encode character u'\xf8' in position 302: ordinal not in range(128)

$ google --version google 0.9.11

vinitkumar commented 10 years ago

From tom.h.mi...@gmail.com on October 19, 2010 09:43:14

Well that's obnoxious. But it seems to be fixed by upgrading to gdata 2.0.12. Could you try upgrading and see if that works?

After you upgrade to 2.0.12, you'll have to run the command with --force-auth to reload the token from Google.

vinitkumar commented 10 years ago

From tais.hansen on October 19, 2010 13:49:06

I found gdata-2.0.8 released for Ubuntu Maverick a few days ago and rebuilt the package for Lucid. gdata-2.0.8 solves the issues I experienced.