Closed listen5k closed 7 years ago
Thanks for using it and reporting bugs!
We recently upgraded to BeautifulSoup4 and it changed some behavior here but first please confirm this issue because I think you have a typo in your example code: &npsp;
isn't a valid HTML entity. I'm assuming you meant
which does work as expected.
>>> import pynliner
>>> pynliner.fromString('<p> ')
u'<p>\xa0</p>'
>>> print _
<p> </p>
@rennat Correct. That was a typo. I updated the description.
I'm expecting that the lib would not convert HTML entities. Please elaborate if you disagree.
The reason I have to preserve
in my HTML is that Outlook needs to have them in an empty <td></td>
for rendering a table correctly.
BeautifulSoup 4 doesn't offer the option to preserve HTML entities. It converts all of them to Unicode characters. See the pull request #45 for discussion.
Hey everyone. Thought I'd pop into this discussion.
https://www.crummy.com/software/BeautifulSoup/bs4/doc/#output-formatters
You can use the soup to restore html entities after BS4 slurps up the input string via a formatter. I tested it with the OP's example, but, are there any edge cases that the formatter might not capture?
>>> from bs4 import BeautifulSoup
>>> s = BeautifulSoup("<p> </p>", "html.parser")
>>> s.prettify(formatter="html")
'<p>\n \n</p>'
EDIT: In certain cases, this wouldn't be wanted if minified output is absolutely necessary.
fixed in 0.7.2
First off, thank for the great library. Recently I upgraded the lib from
0.5.1
to0.7.1
and now it breaks my emails.How to reproduce
Expected
Actual