quakkels / rssdiscoveryengine

The RSS Discovery Engine exists to encourage people to use RSS for finding and consuming their news and current events.
MIT License
159 stars 9 forks source link

Unescape html entities in title for the feedparser object #10

Closed noway closed 3 years ago

noway commented 3 years ago

feedparser has a bug: it doesn't unescape html entites in RSS's <title> tag. It merely returns the contents: https://github.com/mhagander/hamn/blob/0277291830a3b74cb184d14815633d6be9646bf5/hamnadmin/vendor/feedparser/feedparser.py#L952

I found this bug when testing https://rdengine.herokuapp.com/?blog_url=https%3A%2F%2Fnowaycodes.substack.com%2F URL.

This PR includes a workaround: helpers.unescape_feed function which unescapes title.

Before: image

After: image

quakkels commented 3 years ago

Thank you very much for these improvements.