Open ryneeverett opened 10 years ago
>>> re.match(r'\s', '\n')
<_sre.SRE_Match object; span=(0, 1), match='\n'>
>>> re.match(r'\s', r'\n')
>>>
This result came as a surprise to me, but explains why widont has this behavior with newlines. But is this the desired behavior? That is, is the text passed in supposed to be escaped already?
I believe this would be the easiest way to get the correct behavior in the above example:
text = 'blah<br>\nblah'
text = text.encode('unicode-escape') # b'blah<br>\\nblah'
text = text.decode('utf-8') # 'blah<br>\\nblah'
text = widont(text) # 'blah<br>\\nblah'
text = text.encode('utf-8') # b'blah<br>\\nblah'
text = text.decode('unicode-escape') # 'blah<br>\nblah'
It seems like it would be preferable for typogrify to deal with this, and I think it can be done without any encoding/decoding.
If a string ends with a
<br>
and a single word,widont
does nothing:This makes sense to me. But if a string ends with a
<br>\n
,widont
replaces the newline with a
:This doesn't seem right. While the first would render:
the second would render: