delete_entry for entries created by update_feeds method

kei49 commented 1 year ago

Thank you for maintaining this library. This is an issue for a feature request.

Desired

Support delete_entry for entries created by update_feeds method

Current behavior

When you try to delete an entry by using delete_entry, the error will be raised below if the entry was created y update_feeds. This restriction was declared on the API reference

reader.exceptions.EntryError: entry must be added by 'user', got 'feed':

Background

I managed some feeds periodically using reader.update_feeds(), so to sync with the latest entries for each feed. Due to the restriction of database (supporting only sqlite), the library has been crushed frequently maybe because of too large storage. In order to solve this issue, I tried to delete old entries for each feed, rather than deleting feed storage. However, the feature requested here was required to do that.

lemon24 commented 1 year ago

Hi, thank you for opening this!

tl;dr: It will be some time until we have reader.delete_entry(), because "properly" deleting entries is non-trivial. For now, you can use reader._storage.delete_entries(), with some caveats.

Not being able to delete entries is a known issue, tracked in #96 (probably one of the oldest open issues).

Deleting entries properly is non-trivial for two reasons (more details in the Open questions part of https://github.com/lemon24/reader/issues/96#issuecomment-1236304134):

Deleted entries should stay deleted; right now, if you delete an entry that still appears in the feed, it will be added again on the next update.
The entry_dedupe plugin needs the old entry around in order to work.

storage.delete_entries() works for all entries, but does not handle the cases above (hence the limitation on the high level API). Depending on your use case, this may not be an issue, so feel free to use it (it's not part of the stable/documented API, but it is extremely unlikely to change in any way.)

Unfortunately, I don't have any estimate for when I'll be able to work on #96, I don't have a lot of free time at the moment.

lemon24 commented 1 year ago

the library has been crushed frequently maybe because of too large storage

Can you please provide more details on this, so I can better understand your use case?

How many feeds / entries do you have?
How does "crushed" manifest? I'm assuming some methods are slow; which?
Are there a lot of concurrent calls? How many? (order of magnitude is OK)

For reference, my database has 18k entries in 170 feeds.

On a t4g.nano AWS instance (2 vCPU, 0.5 GiB), the /?limit=64&show=all web app page renders in 80ms (it ends up calling get_entries(limit=64) underneath, but I don't have timings for the actual call).

On my 2013 laptop, with the same db:

In [6]: %time _ = list(reader.get_entries(limit=64))
CPU times: user 9.12 ms, sys: 999 µs, total: 10.1 ms
Wall time: 9.34 ms

lemon24 / reader