Closed smathot closed 8 years ago
Besides that this is worrying because it implies that some people miss some necessary modules that should be packaged with OpenSesame, this should be fixed indeed.
I stumbled upon an interesting module called six
which is a compatibility layer between python 2 and 3. I don't know if you've heard of it, but it implements many functions you created for your compat
module for OpenSesame. Anyway, six has a function u()
which is comparable to your safestr() function. However, a sidenote states:
Note On Python 2, u() doesn’t know what the encoding of the literal is. Each byte is converted directly to the unicode codepoint of the same value. Because of this, it’s only safe to use u() with strings of ASCII data.
So I think it is safe to use this function as Exception messages are always returned as ASCII data?
I did some tests, because I find it weird that I have never stumbled into this before, but I have no trouble formatting non-unicode strings into a unicode string. I cannot reproduce the above in any way. Shouldn't this piece of code below cause the same problem?
e = Exception(b"A byte string")
print(u"This should not work: ".format(e))
It works perfectly for me, both in Python 2 and 3. In Python3, it explicitly puts the 'b' prefix in the string:
"This should not work: b'A byte string'"
So I think it is safe to use this function as Exception messages are always returned as ASCII data?
No! Exception messages contain natural language, so for non-English locales they frequently contain non-ASCII text. This is also why your test case is no good (also, you're not actually formatting the exception into the string, but that's probably a typo), and why the error occurs in the first place.
# coding=utf-8
e = Exception(b"Ça ne doit pas marcher")
print(u"This should not work: {}".format(e))
This particular snippet is invalid for Python 3, because bytestrings aren't allowed to contain literal non-ASCII characters, but the problem is the same on Python 2 and 3.
Right, I see. I can prevent the crash now, but there is no way I can convert the decoded characters back to their original identities. Even with your safe_str methods, the best result I get is
u'\xc3\x87a ne doit pas marcher'
Exceptions must be one of a kind. Even after Googling this issue, the only solutions I found was to implement a custom exception class that accounts for this behavior or "simply quit using python 2 and switch to 3"
I received the following automated bug report via OpenSesame. Essentially, the problem is that you're inserting non-Unicode text (coming from the Exception) into a Unicode template string when handling an Exception. The result is that users actually see an error message that results from parsing the error message--confusing.