nkanaev / yarr

yet another rss reader
MIT License
2.96k stars 225 forks source link

Pull atom xhtml title from nested elements #158

Closed wnh closed 1 year ago

wnh commented 1 year ago

The Atom spec says that any title marked with a type of "xhtml" should be contained in a div element[1] so we need to use the full XML text when extracting the text. This is is also compatible with feeds that don't follow the spec and just include text without the wrapping div.

I have a few feeds that follow the spec and it results in every post being called "untitled", which is not ideal.

[1] https://www.rfc-editor.org/rfc/rfc4287#section-3.1

nkanaev commented 1 year ago

lgtm. thanks!