Case insensitive rfind is error prone.

pallets-eco / flask-debugtoolbar

A toolbar overlay for debugging Flask applications

https://flask-debugtoolbar.readthedocs.io

BSD 3-Clause "New" or "Revised" License

953 stars 146 forks source link

Case insensitive rfind is error prone. #124

Closed bmorgan21 closed 4 years ago

bmorgan21 commented 6 years ago

https://github.com/mgood/flask-debugtoolbar/blob/master/flask_debugtoolbar/__init__.py#L205

if you have a utf-8 character where the capital letter is represented in 1 byte, but the lower case character is represented in 2 bytes, then the location of the closing body tag fails.

İstanbulİstanbulİstanbul

The first character is problematic. A solution would be take in the closing tag string as a config argument and not do any lowering during string comparison.

mgood commented 6 years ago

Do you have a reproducible example? That would happen if you're operating directly on the UTF-8 encoded bytes, but just before that the response is decoded into Unicode. So since the indexes are all be based on the Unicode characters it shouldn't have that issue.

It's been a while since I've worked on this code though, so I may be missing something and a complete example to reproduce it would help.

bmorgan21 commented 6 years ago

In [1]: a ='İstanbulİstanbulİstanbul'

In [2]: len(a) Out[2]: 31

In [3]: len(a.lower()) Out[3]: 34

In [4]: index = a.lower().rfind('</body')

In [5]: a[:index] Out[5]: 'İstanbulİstanbulİstanbul</b'

mgood commented 6 years ago

I'm assuming you meant to include </body> at the end of that first string. Your example operates directly on the UTF-8 bytes which is problematic, but the debugtoolbar decodes the response to Unicode before doing those text operations: https://github.com/mgood/flask-debugtoolbar/blob/master/flask_debugtoolbar/__init__.py#L202

What the toolbar does is more similar to this code, which works the way you'd want:

In [1]: a = 'İstanbulİstanbulİstanbul</body>'

In [2]: b = a.decode('utf-8')

In [3]: b
Out[3]: u'\u0130stanbul\u0130stanbul\u0130stanbul</body>'

In [4]: index = b.lower().rfind('</body>')

In [5]: b[:index]
Out[5]: u'\u0130stanbul\u0130stanbul\u0130stanbul'

Did you encounter this issue while using the debugtoolbar? I haven't used Flask in a while, so I don't remember exactly what the best practice is, but you may want to double-check how you're encoding the responses to follow their Unicode support: http://flask.pocoo.org/docs/1.0/unicode/

bmorgan21 commented 6 years ago

I believe this might be a python 2 vs python 3 issue. In python 3, this is what I get running your commands.

In [8]: a = 'İstanbulİstanbulİstanbul</body>'

In [9]: b = a.decode('utf-8')
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
/var/code/app/commands/shell_cmd.py in run_ipython_shell()
     45             # 0.10.x
---> 46             from IPython.Shell import IPShellEmbed
     47             ipshell = IPShellEmbed(banner=banner)

ModuleNotFoundError: No module named 'IPython.Shell'

During handling of the above exception, another exception occurred:

AttributeError                            Traceback (most recent call last)
<ipython-input-9-690e9b2b8621> in <module>()
----> 1 b = a.decode('utf-8')

AttributeError: 'str' object has no attribute 'decode'

the response data is a byte string, calling decode converts it into a unicode string.

jeffwidman commented 4 years ago

This appears to be a legit bug.

Under python 3, response.data is type bytes. So calling decode() on that converts it to type str. The source currently looks like: https://github.com/flask-debugtoolbar/flask-debugtoolbar/blob/70abd78e5510c4f320d4b8d455fcef8efb5909c2/flask_debugtoolbar/__init__.py#L203-L206

So a python 3 example demonstrating the bug:

Python 3.7.6 (default, Dec 30 2019, 19:38:28)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.13.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: a = 'İstanbulİstanbulİstanbul</body>'

In [2]: index = a.lower().rfind('</body>')

In [3]: a[:index]
Out[3]: 'İstanbulİstanbulİstanbul</b'

In [4]: b = "Hello World</body>"

In [5]: index = b.lower().rfind('</body>')

In [6]: b[:index]
Out[6]: 'Hello World'

This is tricky enough that any bug fixing it should include a test that runs both scenarios.