textile / python-textile

A Python port of Textile, A humane web text generator
Other
68 stars 23 forks source link

Incorrect transform unicode url #45

Closed tynopet closed 7 years ago

tynopet commented 7 years ago

Hello, I try to create a link with Unicode characters: https://myabstractwiki.ru/index.php/Заглавная_страница. After the copy, this link from navigation string and paste to textile link transform to https://myabstractwiki.ru/index.php/%D0%97%D0%B0%D0%B3%D0%BB%D0%B0%D0%B2%D0%BD%D0%B0%D1%8F_%D1%81%D1%82%D1%80%D0%B0%D0%BD%D0%B8%D1%86%D0%B0. But the link in rendered HTML looks incorrect: https://myabstractwiki.ru/index.php/%C3%90%C2%97%C3%90%C2%B0%C3%90%C2%B3%C3%90%C2%BB%C3%90%C2%B0%C3%90%C2%B2%C3%90%C2%BD%C3%90%C2%B0%C3%91%C2%8F_%C3%91%C2%81%C3%91%C2%82%C3%91%C2%80%C3%90%C2%B0%C3%90%C2%BD%C3%90%C2%B8%C3%91%C2%86%C3%90%C2%B0. I suspect that this is due to double encoding in Unicode (see line 945 - 948):

path = '/'.join(  # could be encoded slashes!
            quote(unquote(pce).encode('utf8'), b'')
            for pce in parsed.path.split('/')
        )

How to fix this problem? Thanks.

ikirudennis commented 7 years ago

Before I start digging into this, I'd like to confirm what the test is. Does the following generate the bug for you?

"test":https://myabstractwiki.ru/index.php/Заглавная_страница

txstyle.org turns that into <p><a href="https://myabstractwiki.ru/index.php/%D0%97%D0%B0%D0%B3%D0%BB%D0%B0%D0%B2%D0%BD%D0%B0%D1%8F_%D1%81%D1%82%D1%80%D0%B0%D0%BD%D0%B8%D1%86%D0%B0">test</a></p>. And so far that's what I'm getting with the current version of textile. Are you using the latest version?

tynopet commented 7 years ago

Yes, I use latest version textile. Parser on textile.org working correctly. But python library works incorrectly. For example repl output:

dmitry@dmitry:~$ python
Python 2.7.13 (default, Jan 19 2017, 14:48:08) 
[GCC 6.3.0 20170118] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import textile
>>> str = '"test":https://myabstractwiki.ru/index.php/%D0%97%D0%B0%D0%B3%D0%BB%D0%B0%D0%B2%D0%BD%D0%B0%D1%8F_%D1%81%D1%82%D1%80%D0%B0%D0%BD%D0%B8%D1%86%D0%B0'
>>> print textile.textile(str)
    <p><a href="https://myabstractwiki.ru/index.php/%C3%90%C2%97%C3%90%C2%B0%C3%90%C2%B3%C3%90%C2%BB%C3%90%C2%B0%C3%90%C2%B2%C3%90%C2%BD%C3%90%C2%B0%C3%91%C2%8F_%C3%91%C2%81%C3%91%C2%82%C3%91%C2%80%C3%90%C2%B0%C3%90%C2%BD%C3%90%C2%B8%C3%91%C2%86%C3%90%C2%B0">test</a></p>

As you can see, the value of the href attribute is different.

I suspect that the textile.org uses a PHP parser that works correctly.

ikirudennis commented 7 years ago

This is fixed for now. I don't know when I'll push out a new update, but if you need an urgent fix for this, you can use pip install git+https://github.com/textile/python-textile.git@82b15458faa1efa7d2f8fce16347ad01299199c1#egg=textile to install a version with the fix.

tynopet commented 7 years ago

Thank you very much!