Open sligodave opened 12 years ago
Yep you're totally right, I thought I had opened an issue to that effect here but I guess I had not. I actually opened up a bug on their launchpad and need to respond with some info. I will use the example you provided, thanks for that. The maintainer has some suggestions and you can follow up here: https://bugs.launchpad.net/beautifulsoup/+bug/949074
May be interested in my replacement proejct for djangoembed, http://micawber.readthedocs.org/ -- the html parser does not ahve this issue.
This issue remains, breaking the HTML parsing method. Downgrading to 3.2.0 is a temporary solution.
The proper solution is described on the bug report: "If you put a string into the soup its XML characters should always be escaped. Since you want "YAY" to be treated as an HTML tag, you can create a Tag object instead"
You can also see on lines 130/131 of micawber, I have fixed this: https://github.com/coleifer/micawber/blob/master/micawber/parsers.py#L130
Please note - i am not working on this project anymore. I've written a replacement:
Hi, I could be wrong here but just in case, I said I'd bring this to your attention.
At the end of the parse_data method of the HTMLParser where you call "replaceWith" on the matched url; It appears that with the step from BeautifulSoup 3.2.0 to BeautifulSoup 3.2.1 the inserted html is now being entity encoded, thus breaking things.
The above under BS 3.2.0 printed:
``` <b>YAY</b>
Under BS 3.2.1 it prints:
<b>YAY</b>
I haven't had the time to dig an awful lot but the solution might be to create a BS representation of the replacement html and pass that to replaceWith.
Thanks, Dave