open-cogsci / python-mediadecoder

A media decoding library based on MoviePy
http://open-cogsci.github.io/python-mediadecoder
MIT License
9 stars 3 forks source link

UnicodeDecodeError when processing import Exception #1

Closed smathot closed 8 years ago

smathot commented 8 years ago

I received the following automated bug report via OpenSesame. Essentially, the problem is that you're inserting non-Unicode text (coming from the Exception) into a Unicode template string when handling an Exception. The result is that users actually see an error message that results from parsing the error message--confusing.

Traceback:
  File "C:\Program Files (x86)\OpenSesame\lib\site-packages\libopensesame\item_store.py", line 159, in new
    self.experiment, script, self.experiment.item_prefix())
  File "C:\Program Files (x86)\OpenSesame\lib\site-packages\libopensesame\plugins.py", line 361, in load_plugin
    item_module = import_plugin(plugin, _type=_type)
  File "C:\Program Files (x86)\OpenSesame\lib\site-packages\libopensesame\plugins.py", line 332, in import_plugin
    return imp.load_source(plugin, path)
  File "C:\Program Files (x86)\OpenSesame\share\opensesame_plugins\media_player_mpy\media_player_mpy.py", line 42, in <module>
    import mediadecoder
  File "C:\Program Files (x86)\OpenSesame\lib\site-packages\mediadecoder\__init__.py", line 6, in <module>
    from mediadecoder.decoder import Decoder
  File "C:\Program Files (x86)\OpenSesame\lib\site-packages\mediadecoder\decoder.py", line 21, in <module>
    Please make sure that they are installed.""".format(e))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf3 in position 30: ordinal not in range(128)
dschreij commented 8 years ago

Besides that this is worrying because it implies that some people miss some necessary modules that should be packaged with OpenSesame, this should be fixed indeed.

I stumbled upon an interesting module called six which is a compatibility layer between python 2 and 3. I don't know if you've heard of it, but it implements many functions you created for your compat module for OpenSesame. Anyway, six has a function u() which is comparable to your safestr() function. However, a sidenote states:

Note On Python 2, u() doesn’t know what the encoding of the literal is. Each byte is converted directly to the unicode codepoint of the same value. Because of this, it’s only safe to use u() with strings of ASCII data.

So I think it is safe to use this function as Exception messages are always returned as ASCII data?

dschreij commented 8 years ago

I did some tests, because I find it weird that I have never stumbled into this before, but I have no trouble formatting non-unicode strings into a unicode string. I cannot reproduce the above in any way. Shouldn't this piece of code below cause the same problem?

e = Exception(b"A byte string")
print(u"This should not work: ".format(e))

It works perfectly for me, both in Python 2 and 3. In Python3, it explicitly puts the 'b' prefix in the string:

"This should not work: b'A byte string'"

smathot commented 8 years ago

So I think it is safe to use this function as Exception messages are always returned as ASCII data?

No! Exception messages contain natural language, so for non-English locales they frequently contain non-ASCII text. This is also why your test case is no good (also, you're not actually formatting the exception into the string, but that's probably a typo), and why the error occurs in the first place.

# coding=utf-8
e = Exception(b"Ça ne doit pas marcher")
print(u"This should not work: {}".format(e))

This particular snippet is invalid for Python 3, because bytestrings aren't allowed to contain literal non-ASCII characters, but the problem is the same on Python 2 and 3.

dschreij commented 8 years ago

Right, I see. I can prevent the crash now, but there is no way I can convert the decoded characters back to their original identities. Even with your safe_str methods, the best result I get is

u'\xc3\x87a ne doit pas marcher'

Exceptions must be one of a kind. Even after Googling this issue, the only solutions I found was to implement a custom exception class that accounts for this behavior or "simply quit using python 2 and switch to 3"