Open lemon24 opened 5 years ago
How other people handle this:
Akregator has 4 archive settings (can be configured globally, or per feed) (update: unchanged as of 2022):
Also, not deleting important articles can be turned off.
Tiny Tiny RSS can purge articles after X days (can be configured globally, or per feed); some details:
An interesting (but somewhat unrelated feature) is the Archived feed, which keeps starred articles from deleted feeds and share-anything articles (you can add articles that have no feed). Articles in the Archived feed are not purged.
Presumably, it would be also nice to mark a whole feed as important ("don't delete"). This could also be implemented as a plug-in that marks each new entry as important, but it may pollute individual important entries.
Requirements:
strategies
.reader.
reserved taglevels:
important entries are never deleted
unread entries are never deleted
happens after the feed is updated
must happen at Python level, can't select + delete in a single query
Open questions:
"delete entries older than X days"
entry_dedupe
needs the old entry to be able to dedupe, it cannot work if it has been deleted
after_{entry,feed}_update_hooks
must run before entries are deletedWe still need to keep accurate EntryCounts.averages after the entries were deleted (but not for duplicates).
TODO: kinds of duplicates (broadly) × deduplication mechanisms matrix
Kinds of duplicates:
Deduplication mechanisms:
A database with ~3000 entries takes about 21M, which is perfectly acceptable. However, at the moment there is no way to remove old entries, and the database can grow arbitrarily.