Closed bmorgan21 closed 4 years ago
Do you have a reproducible example? That would happen if you're operating directly on the UTF-8 encoded bytes, but just before that the response is decoded into Unicode. So since the indexes are all be based on the Unicode characters it shouldn't have that issue.
It's been a while since I've worked on this code though, so I may be missing something and a complete example to reproduce it would help.
In [1]: a ='İstanbulİstanbulİstanbul
'
In [2]: len(a) Out[2]: 31
In [3]: len(a.lower()) Out[3]: 34
In [4]: index = a.lower().rfind('</body')
In [5]: a[:index] Out[5]: 'İstanbulİstanbulİstanbul</b'
I'm assuming you meant to include </body>
at the end of that first string. Your example operates directly on the UTF-8 bytes which is problematic, but the debugtoolbar decodes the response to Unicode before doing those text operations:
https://github.com/mgood/flask-debugtoolbar/blob/master/flask_debugtoolbar/__init__.py#L202
What the toolbar does is more similar to this code, which works the way you'd want:
In [1]: a = 'İstanbulİstanbulİstanbul</body>'
In [2]: b = a.decode('utf-8')
In [3]: b
Out[3]: u'\u0130stanbul\u0130stanbul\u0130stanbul</body>'
In [4]: index = b.lower().rfind('</body>')
In [5]: b[:index]
Out[5]: u'\u0130stanbul\u0130stanbul\u0130stanbul'
Did you encounter this issue while using the debugtoolbar? I haven't used Flask in a while, so I don't remember exactly what the best practice is, but you may want to double-check how you're encoding the responses to follow their Unicode support: http://flask.pocoo.org/docs/1.0/unicode/
I believe this might be a python 2 vs python 3 issue. In python 3, this is what I get running your commands.
In [8]: a = 'İstanbulİstanbulİstanbul</body>'
In [9]: b = a.decode('utf-8')
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
/var/code/app/commands/shell_cmd.py in run_ipython_shell()
45 # 0.10.x
---> 46 from IPython.Shell import IPShellEmbed
47 ipshell = IPShellEmbed(banner=banner)
ModuleNotFoundError: No module named 'IPython.Shell'
During handling of the above exception, another exception occurred:
AttributeError Traceback (most recent call last)
<ipython-input-9-690e9b2b8621> in <module>()
----> 1 b = a.decode('utf-8')
AttributeError: 'str' object has no attribute 'decode'
the response data is a byte string, calling decode converts it into a unicode string.
This appears to be a legit bug.
Under python 3, response.data
is type bytes
. So calling decode()
on that converts it to type str
. The source currently looks like: https://github.com/flask-debugtoolbar/flask-debugtoolbar/blob/70abd78e5510c4f320d4b8d455fcef8efb5909c2/flask_debugtoolbar/__init__.py#L203-L206
So a python 3 example demonstrating the bug:
Python 3.7.6 (default, Dec 30 2019, 19:38:28)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.13.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: a = 'İstanbulİstanbulİstanbul</body>'
In [2]: index = a.lower().rfind('</body>')
In [3]: a[:index]
Out[3]: 'İstanbulİstanbulİstanbul</b'
In [4]: b = "Hello World</body>"
In [5]: index = b.lower().rfind('</body>')
In [6]: b[:index]
Out[6]: 'Hello World'
This is tricky enough that any bug fixing it should include a test that runs both scenarios.
https://github.com/mgood/flask-debugtoolbar/blob/master/flask_debugtoolbar/__init__.py#L205
if you have a utf-8 character where the capital letter is represented in 1 byte, but the lower case character is represented in 2 bytes, then the location of the closing body tag fails.
İstanbulİstanbulİstanbul
The first character is problematic. A solution would be take in the closing tag string as a config argument and not do any lowering during string comparison.