Closed lemon24 closed 2 years ago
I did a comparison between feedparser and atoma, by parsing 157 feeds from disk.
atoma seems to be faster and consume significantly less memory (for a fair comparison, feedparser had both sanitization and relative link resolution disabled).
noop doesn't do anything with the feeds, to provide a baseline.
# impl time maxrss
# Ubuntu 20.04, Python 3.8.10
feedparser 9.0 61
atoma 1.5 28
noop 0.0 20
# macOS Catalina, Python 3.8.10
feedparser 14.5 56
atoma 2.3 29
noop 0.0 18
Unfortunately, atoma doesn't support some of the RSS feeds:
error: _feeds/https-blog-nelhage-com-atom-xml.atom: Could not parse feed: "rss" does not have a "feed:id"
error: _feeds/https-nedbatchelder-com-blog-rss-xml.rss: Cannot process RSS feed version "None"
error: _feeds/https-ciechanow-ski-atom-xml.atom: Could not parse feed: "rss" does not have a "feed:id"
error: _feeds/http-www-xn-8ws00zhy3a-com-feed.atom: EntitiesForbidden(name='xhtml', system_id=None, public_id=None)
error: _feeds/https-www-reddit-com-r-oilshell-rss.rss: Not a valid XML document
error: _feeds/https-blog-ncase-me-rss.rss: Cannot process RSS feed version "None"
error: _feeds/https-danluu-com-atom-xml.atom: Could not parse feed: "rss" does not have a "feed:id"
error: _feeds/https-blogs-dropbox-com-tech-feed.rss: Could not parse feed: "url" text is required but is empty
The EntitiesForbidden error is due using defusedxml (https://github.com/lemon24/reader/issues/212#issuecomment-886175089).
Closing in favor of #265.
In light of various issues feedparser has (see #265), I think it's wise we consider other feed parser implementations to use.
In this issue, we'll look at https://github.com/NicolasLM/atoma; my comments [in brackets]:
Of note, it:
base
).