kurtmckee / feedparser

Parse feeds in Python
https://feedparser.readthedocs.io
Other
1.99k stars 343 forks source link

adding url prefix to id tag #323

Closed sayginify closed 2 years ago

sayginify commented 2 years ago

I've tried feedparser with following url : http://128.199.162.51/feed

normally, within the xml file there're id tags such as 72122 but when I parse them, within the feed entries, it becomes http://128.199.162.51/72122

any idea what might be the cause?

kurtmckee commented 2 years ago

This is happening because feedparser is trying hard to do the right thing, which in this case is to assume that the feed is declared as RSS but is using elements from the Atom specification.

The <id> elements are assumed to be Atom ID's. feedparser is assuming they are relative URI's, and is normalizing them to maintain uniqueness.

That feed smells like its author used a feed generator instead of an XML generator and injected arbitrary content. If this is the only XML document you're wanting to parse, you might benefit from using an XML parser instead of feedparser.

sayginify commented 2 years ago

thanks a lot for the quick and detailed response.