lemon24 / reader

A Python feed reader library.
https://reader.readthedocs.io
BSD 3-Clause "New" or "Revised" License
444 stars 37 forks source link

get_entries(sort='recent') "denial of service" #305

Open lemon24 opened 1 year ago

lemon24 commented 1 year ago

The new heuristic from #279 is too lax, if a feed adds all lots of new entries, it can spam the top of "recent"; this should not be possible.

Background: https://www.michaelnygard.com/atom.xml in my personal instance just added 60+ (years) old entries; for now I marked everything as read, so the added time is preserved for debugging.

Possible solutions:

lemon24 commented 1 year ago

Another one(?) – 15 https://scattered-thoughts.net/atom.xml entries from the last ~2 years just appeared in my feed; these may be duplicates (if so, entry_dedupe should copy the old recent sort key, if it doesn't already).

Update: Nope, they were "new" entries (not duplicates); also, entry_dedupe already handles the recent sort key.

lemon24 commented 6 months ago

http://yosefk.com/blog/feed switched feed generators a few days ago, and I got 7 8-year-old entries in recent. I believe the root cause is the same (new feed has more entries than the old one).

Update: I checked, and this is indeed the root cause.

lemon24 commented 4 months ago

Another one, with hundreds of entries: https://blog.startifact.com/rss.xml