textile / python-textile

A Python port of Textile, A humane web text generator
Other
68 stars 23 forks source link

url_parse chokes on Unicode characters #36

Closed crw closed 7 years ago

crw commented 7 years ago

Trying to parse "Chögyam Trungpa":https://www.google.com/search?q=Chögyam+Trungpa.

Here is the exception I am seeing:

Traceback (most recent call last):
  File "/usr/local/www/hyper/env/local/lib/python2.7/site-packages/flask/app.py", line 1988, in wsgi_app
    response = self.full_dispatch_request()
  File "/usr/local/www/hyper/env/local/lib/python2.7/site-packages/flask/app.py", line 1641, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/usr/local/www/hyper/env/local/lib/python2.7/site-packages/newrelic-2.72.0.52/newrelic/hooks/framework_flask.py", line 98, in _nr_wrapper_Flask_handle_exception_
    return wrapped(*args, **kwargs)
  File "/usr/local/www/hyper/env/local/lib/python2.7/site-packages/flask/app.py", line 1544, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/usr/local/www/hyper/env/local/lib/python2.7/site-packages/flask/app.py", line 1639, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/local/www/hyper/env/local/lib/python2.7/site-packages/flask/app.py", line 1625, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/usr/local/www/hyper/env/local/lib/python2.7/site-packages/newrelic-2.72.0.52/newrelic/hooks/framework_flask.py", line 40, in _nr_wrapper_handler_
    return wrapped(*args, **kwargs)
  File "/usr/local/www/hyper/env/local/lib/python2.7/site-packages/hyper/web/lib/auth.py", line 15, in decorated_function
    return f(*args, **kwargs)
  File "/usr/local/www/hyper/env/local/lib/python2.7/site-packages/hyper/web/blueprint/chat/__init__.py", line 116, in list_posts
    updated_at=int(time.time())
  File "/usr/local/www/hyper/env/local/lib/python2.7/site-packages/flask/templating.py", line 134, in render_template
    context, ctx.app)
  File "/usr/local/www/hyper/env/local/lib/python2.7/site-packages/flask/templating.py", line 116, in _render
    rv = template.render(context)
  File "/usr/local/www/hyper/env/local/lib/python2.7/site-packages/newrelic-2.72.0.52/newrelic/api/function_trace.py", line 98, in dynamic_wrapper
    return wrapped(*args, **kwargs)
  File "/usr/local/www/hyper/env/local/lib/python2.7/site-packages/jinja2/environment.py", line 989, in render
    return self.environment.handle_exception(exc_info, True)
  File "/usr/local/www/hyper/env/local/lib/python2.7/site-packages/jinja2/environment.py", line 754, in handle_exception
    reraise(exc_type, exc_value, tb)
  File "/usr/local/www/hyper/env/local/lib/python2.7/site-packages/hyper/web/blueprint/chat/templates/list_posts.html", line 1, in top-level template code
    {% extends "chat_page.html" %}
  File "/usr/local/www/hyper/env/local/lib/python2.7/site-packages/hyper/web/blueprint/chat/templates/chat_page.html", line 1, in top-level template code
    {% extends "layout.html" %}
  File "/usr/local/www/hyper/env/local/lib/python2.7/site-packages/hyper/web/templates/layout.html", line 27, in top-level template code
    {% block content %}
  File "/usr/local/www/hyper/env/local/lib/python2.7/site-packages/hyper/web/blueprint/chat/templates/chat_page.html", line 13, in block "content"
    {% block main_column %}{% endblock %}
  File "/usr/local/www/hyper/env/local/lib/python2.7/site-packages/hyper/web/blueprint/chat/templates/list_posts.html", line 11, in block "main_column"
    {% include('_post_list.html') %}
  File "/usr/local/www/hyper/env/local/lib/python2.7/site-packages/hyper/web/blueprint/chat/templates/_post_list.html", line 9, in top-level template code
    {% include('_post_item.html') %}
  File "/usr/local/www/hyper/env/local/lib/python2.7/site-packages/hyper/web/blueprint/chat/templates/_post_item.html", line 56, in top-level template code
    <span class="value">{{ post.get_display_message()|safe }}</span>
  File "/usr/local/www/hyper/env/local/lib/python2.7/site-packages/hyper/model/base.py", line 411, in get_display_message
    html_type='html5'
  File "/usr/local/www/hyper/env/local/lib/python2.7/site-packages/textile/core.py", line 1367, in textile_restricted
    text)
  File "/usr/local/www/hyper/env/local/lib/python2.7/site-packages/textile/core.py", line 251, in parse
    text = self.block(text)
  File "/usr/local/www/hyper/env/local/lib/python2.7/site-packages/textile/core.py", line 456, in block
    block = Block(self, tag, atts, ext, cite, line)
  File "/usr/local/www/hyper/env/local/lib/python2.7/site-packages/textile/objects/block.py", line 29, in __init__
    self.process()
  File "/usr/local/www/hyper/env/local/lib/python2.7/site-packages/textile/objects/block.py", line 121, in process
    self.content = self.textile.graf(self.content)
  File "/usr/local/www/hyper/env/local/lib/python2.7/site-packages/textile/core.py", line 582, in graf
    text = self.links(text)
  File "/usr/local/www/hyper/env/local/lib/python2.7/site-packages/textile/core.py", line 605, in links
    return self.replaceLinks(text)
  File "/usr/local/www/hyper/env/local/lib/python2.7/site-packages/textile/core.py", line 718, in replaceLinks
    text = re.compile(pattern, flags=re.X | re.U).sub(self.fLink, text)
  File "/usr/local/www/hyper/env/local/lib/python2.7/site-packages/textile/core.py", line 874, in fLink
    url = self.shelveURL(self.encode_url(urlunsplit(uri_parts)))
  File "/usr/local/www/hyper/env/local/lib/python2.7/site-packages/textile/core.py", line 929, in encode_url
    query = quote(unquote(parsed.query), b'=&?/')
  File "/usr/lib/python2.7/urllib.py", line 1299, in quote
    return ''.join(map(quoter, s))
KeyError: u'\xf6'

Based on information from https://stackoverflow.com/questions/15115588/urllib-quote-throws-keyerror, I believe the following line needs to include .encode('utf-8')

textile/core.py line 929

        query = quote(unquote(parsed.query), b'=&?/')

to

        query = quote(unquote(parsed.query.encode('utf-8')), b'=&?/')

That is a quick-fix (for python2) but I am not entirely sure what is going on in this bit of code, so there may be a better fix to be performed.

edit: that change makes many unittests fail in python3, so it is a no-go. Unittests pass for python2.

crw commented 7 years ago

Thank you very much! I did not expect such a quick fix. Very appreciated!

ikirudennis commented 7 years ago

Thank you for providing a clear test case, and even an attempt at a fix.