Closed lemon24 closed 2 years ago
The logical pipeline of parsing a feed:
Currently:
I've pretty much decided to continue using feedparser (https://github.com/lemon24/reader/issues/265#issuecomment-981671759) and not switching to Atoma (https://github.com/lemon24/reader/issues/263), but it's worth documenting the factors that went into it.
I looked at feedparser 6.0.8, and Atoma 0.0.17.
feedparser | Atoma | |
---|---|---|
stable | yes | no (0.x) |
maintainer responsiveness | low | high |
format detection | yes | yes (tries to parse all formats) |
JSON feed | no | yes |
old feed formats | yes | no |
Atom/RSS extensions | medium | high |
file objects | yes | yes (no autodetection) |
memory usage | high (reads feed in memory multiple times) | medium (builds whole etree) |
typed | no | yes |
safe XML | no | yes (defusedxml) |
pluggable XML parser (defusedxml, lxml) | no (yes with global/monkeypatching) | no |
bad encodings | yes | no |
malformed feeds | yes | no |
relative link resolution | yes (can be disabled, exposes XML base) | no |
HTML sanitization | yes (can be disabled) | no |
unified feed/entry interface | yes | no |
Closing in favor of #265.
In light of various issues feedparser has (see #265), I think it's wise we consider other feed parser implementations to use.
In this issue, we'll: