mintchaos / typogrify

A set of Django template filters to make caring about typography on the web a bit easier.
http://static.mintchaos.com/projects/typogrify/
Other
168 stars 29 forks source link

widont line break/newline behavior #38

Open ryneeverett opened 10 years ago

ryneeverett commented 10 years ago

If a string ends with a <br> and a single word, widont does nothing:

>>> widont('blah<br>blah')
'blah<br>blah'

This makes sense to me. But if a string ends with a <br>\n, widont replaces the newline with a &nbsp;:

>>> widont('blah<br>\nblah')
'blah<br>&nbsp;blah'

This doesn't seem right. While the first would render:

blah blah

the second would render:

blah  blah

ryneeverett commented 10 years ago
>>> re.match(r'\s', '\n')
<_sre.SRE_Match object; span=(0, 1), match='\n'>
>>> re.match(r'\s', r'\n')
>>>

This result came as a surprise to me, but explains why widont has this behavior with newlines. But is this the desired behavior? That is, is the text passed in supposed to be escaped already?

I believe this would be the easiest way to get the correct behavior in the above example:

text = 'blah<br>\nblah'
text = text.encode('unicode-escape')  # b'blah<br>\\nblah'
text = text.decode('utf-8')  # 'blah<br>\\nblah'
text = widont(text)  # 'blah<br>\\nblah'
text = text.encode('utf-8')  # b'blah<br>\\nblah'
text = text.decode('unicode-escape')  # 'blah<br>\nblah'

It seems like it would be preferable for typogrify to deal with this, and I think it can be done without any encoding/decoding.