inducer / pudb

Full-screen console debugger for Python
https://documen.tician.de/pudb/
Other
3k stars 230 forks source link

Clarify Unicode semantics of var view construction #339

Closed inducer closed 5 years ago

inducer commented 5 years ago

Closes #338

codecov-io commented 5 years ago

Codecov Report

Merging #339 into master will increase coverage by 0.9%. The diff coverage is 76.92%.

Impacted file tree graph

@@           Coverage Diff            @@
##           master    #339     +/-   ##
========================================
+ Coverage    19.5%   20.4%   +0.9%     
========================================
  Files          15      15             
  Lines        2902    2906      +4     
========================================
+ Hits          566     593     +27     
+ Misses       2336    2313     -23
Impacted Files Coverage Δ
pudb/ui_tools.py 29.57% <ø> (ø) :arrow_up:
pudb/var_view.py 23.96% <76.92%> (+7.79%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 1bc67bf...c5827d9. Read the comment docs.

asmeurer commented 5 years ago

It seems to fix the original issue. Is it possible to add a test for it?

I usually don't use unicode in Python 2, but I also admit that I don't know what the "best" way to deal with these things in Python 2 is. The problem I usually have is if a str object with a non-ASCII character in it slips in, then it errors (like 'π' + u'i'). Python 2's str is kind of half encoded bytes half decoded Unicode, so it can happen. The only way to check is to test non-ASCII characters everywhere, because stuff like u'a' + 'b' works just fine.

Even so, this is probably fine. I didn't get any bugs with it after playing with it. Actually I thought we had already fixed this exact issue for double-width characters (which are non-ASCII), but I guess not, or else there was a regression.

asmeurer commented 5 years ago

Also I don't know if it would be an issue that the unicode constructor fails on non-ASCII bytes, rather than assuming they are utf-8 and decoding them.

>>> unicode('π')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xcf in position 0: ordinal not in range(128)
>>> 'π'.decode('utf-8')
u'\u03c0'
inducer commented 5 years ago

Thanks for taking a look!

Is it possible to add a test for it?

Added a test.

Also I don't know if it would be an issue that the unicode constructor fails on non-ASCII bytes,

I think I only used that where (1) the input was likely to be a type name, which wouldn't be Unicode in Py2, and (2) where I was too lazy to differentiate code paths, so the same path had to work in 2 and 3.

I'll merge if the tests pass.

inducer commented 5 years ago

Added a test.

Turned out to not be a waste of time.