Open philbudne opened 8 months ago
A recent thought: If the stories table had a "last_seen" column (updated each time the URL is found in a feed), we could use it to prevent aging out entries from unchanging feeds (would need to compare story.last_seen to the last time new/different content was returned (http_last_modified?).
This would increase database write load, but would prevent duplicates generated every time a URL from a static feed is expired from the stories table.
Currently old stories are pruned by date, so entries from slow/static feeds time out, and "new" articles keep on being discovered.
The
fetch_events
table is pruned to a fixed number of entries, doing the same for thestories
table might avoid the rediscovery problem.