textile / python-textile

A Python port of Textile, A humane web text generator
Other
68 stars 23 forks source link

Encoding on title attribute for a link tag fails #30

Closed jeroenp closed 8 years ago

jeroenp commented 8 years ago

The textile.utils generate_tags() function crashes with a UnicodeDecodeError when you use special characters on the title attribute, for example:

"Tëxtíle (Tëxtíle)":http://lala.com

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 9: ordinal not in range(128)

I tried solving the issue by encoding the inserted content, since that mixes the bytes with a unicode string that causes the decode error, but that makes textile crash somewhere in core.py:

element_tag.insert(len(element_tag) - 1, content.encode(enc))

What works for me is decoding the result from elementtree:

        element_tag = [v.decode(enc) for v in ElementTree.tostringlist(
                       element, encoding=enc, method='html')]

Platform: python 2.7.11

ikirudennis commented 8 years ago

Blargh. Out of curiosity, would anyone care if I stopped supporting Python 2.6?

ikirudennis commented 8 years ago

Note to self: Nothing is ever easy.

ikirudennis commented 8 years ago

@jeroenp I think I've got a reasonable solution going on here. Are you able to test your code against the hotfix/unicode_title branch to confirm there aren't more edge cases?

jbouclier commented 8 years ago

I was having troubles with the following link:

"ANMAT(Administración Nacional de Medicamentos, Alimentos y Tecnología Médica)":http://www.anmat.gov.ar

As you can see there are some accuted letters there.

The hot fix solved for me. Is the first time I install a hotfix using pip I'm not sure is relevant, just in case this is how I installed the hotfix:

pip uninstall textile
pip install git+https://github.com/textile/python-textile.git@hotfix/unicode_title
ikirudennis commented 8 years ago

That's great. And yes, those pip commands are correct, though it might complain that it's lacking a #egg=textile on the end of that git url. I'm going to push this out shortly.

jbouclier commented 8 years ago

It didn't complain about the lack of #egg=textile. This is the output from the pip install:

E:\temp\app_map>pip install git+https://github.com/textile/python-textile.git@hotfix/unicode_title
Collecting git+https://github.com/textile/python-textile.git@hotfix/unicode_title
  Cloning https://github.com/textile/python-textile.git (to hotfix/unicode_title) to c:\users\jjavier\appdata\local\temp\pip-clh9vq-build
Requirement already satisfied (use --upgrade to upgrade): six in c:\adp\python27\lib\site-packages (from textile==2.3.3)
Installing collected packages: textile
  Running setup.py install for textile ... done
Successfully installed textile-2.3.3

May be is something in my local environment? It doesn't seems something to worry about.

Just in case, this the version of pip I'm using:

E:\temp\app_map>pip --version
pip 8.1.2 from C:\adp\Python27\lib\site-packages (python 2.7)
ikirudennis commented 8 years ago

Those all seem to be non-issues. Anyway, I've merged this in, and will be releasing the new version as soon as it passes through travis.