lemon24 / reader

A Python feed reader library.
https://reader.readthedocs.io
BSD 3-Clause "New" or "Revised" License
438 stars 36 forks source link

readtime/entry_dedupe can sometime fail with EntryNotFoundError #292

Closed lemon24 closed 1 year ago

lemon24 commented 1 year ago

... likely when an entry is deleted by entry_dedupe.

Got this error while updating a feed marked as stale: ``` 2022-09-10T00:44:50 13395 CRITICAL command failed due to unexpected error: no such entry: ('http://nodumbqs.libsyn.com/rss', '58d0f2206a6920c2afbc8e1f78b55d1f'); traceback follows Traceback (most recent call last): File "/Users/lemon/code/reader/src/reader/_cli.py", line 126, in wrapper rv = fn(*args, **kwargs) File "/Users/lemon/code/reader/src/reader/_cli.py", line 178, in wrapper return fn(reader, *args, **kwargs) File "/Users/lemon/code/reader/src/reader/_cli.py", line 386, in update for result in bar: File "/Users/lemon/code/reader/src/reader/_cli.py", line 295, in iter_update_status for i, result in enumerate(it): File "/Users/lemon/code/reader/src/reader/core.py", line 1037, in update_feeds_iter yield from Pipeline.from_reader(self, map).update(filter_options) File "/Users/lemon/code/reader/src/reader/_update.py", line 390, in update for url, value in update_results: File "/Users/lemon/code/reader/src/reader/_update.py", line 428, in process_parse_result raise e File "/Users/lemon/code/reader/src/reader/_update.py", line 425, in process_parse_result counts = self.update_feed(feed.url, *intents) File "/Users/lemon/code/reader/src/reader/_update.py", line 478, in update_feed entry_hook(self.reader, entry.entry, entry_status) File "/Users/lemon/code/reader/src/reader/plugins/readtime.py", line 104, in _after_entry_update reader.set_tag(entry, key, _readtime_of_entry(entry)) File "/Users/lemon/code/reader/src/reader/core.py", line 2042, in set_tag self._storage.set_tag(resource_id, key, value) File "/usr/local/Cellar/python@3.10/3.10.2/Frameworks/Python.framework/Versions/3.10/lib/python3.10/contextlib.py", line 79, in inner return func(*args, **kwds) File "/Users/lemon/code/reader/src/reader/_storage.py", line 1168, in set_tag raise info.not_found_exc(*resource_id) from None ```

Logs of relevant entries:

DEBUG    update feed 'http://nodumbqs.libsyn.com/rss': entry '03ff1a28802bdac830b39571be124d84': feed marked as stale, updating
DEBUG    update feed 'http://nodumbqs.libsyn.com/rss': entry '58d0f2206a6920c2afbc8e1f78b55d1f': feed marked as stale, updating
DEBUG    update feed 'http://nodumbqs.libsyn.com/rss': entry 'bb0fc1aa4ba9a3fe5875aa769d81c22b': feed marked as stale, updating
INFO     entry_dedupe: ('http://nodumbqs.libsyn.com/rss', '03ff1a28802bdac830b39571be124d84') (title: '023 - Tackling Tragedy (And NetNeutrality)') duplicates: ['bb0fc1aa4ba9a3fe5875aa769d81c22b', '58d0f2206a6920c2afbc8e1f78b55d1f']
INFO     entry_dedupe: set_entry_read(('http://nodumbqs.libsyn.com/rss', '03ff1a28802bdac830b39571be124d84'), True, None)
INFO     entry_dedupe: set_entry_recent_sort(('http://nodumbqs.libsyn.com/rss', '03ff1a28802bdac830b39571be124d84'), datetime.datetime(2018, 1, 10, 5, 8, 39))
INFO     entry_dedupe: delete_entries([('http://nodumbqs.libsyn.com/rss', 'bb0fc1aa4ba9a3fe5875aa769d81c22b'), ('http://nodumbqs.libsyn.com/rss', '58d0f2206a6920c2afbc8e1f78b55d1f')])
INFO     readtime: setting .readtime for ('http://nodumbqs.libsyn.com/rss', '03ff1a28802bdac830b39571be124d84') (entry update hook)
INFO     readtime: setting .readtime for ('http://nodumbqs.libsyn.com/rss', '58d0f2206a6920c2afbc8e1f78b55d1f') (entry update hook)
CRITICAL command failed due to unexpected error: no such entry: ('http://nodumbqs.libsyn.com/rss', '58d0f2206a6920c2afbc8e1f78b55d1f'); traceback follows
lemon24 commented 1 year ago

OK, so there are 3, possibly related problems (the last 2 are definitely related).

I think they call these "feature interaction bugs".


First, readtime fails as shown above.


Second, updating an old database, with only entry_dedupe, fails:

from reader import make_reader

import logging
logging.basicConfig(format="%(levelname)s %(message)s")
logging.getLogger('reader.plugins.entry_dedupe').setLevel(logging.INFO)

feed_url = 'http://nodumbqs.libsyn.com/rss' 
ids = [
    '03ff1a28802bdac830b39571be124d84',
    '58d0f2206a6920c2afbc8e1f78b55d1f',
    'bb0fc1aa4ba9a3fe5875aa769d81c22b',
]

reader = make_reader('db.sqlite', plugins=['reader.entry_dedupe'])

def show_entries():
    for id in ids:
        entry = reader.get_entry((feed_url, id), None)
        title = entry.title if entry else None
        print(id, title)

print('before')
show_entries()

print('after')
reader.update_feed(feed_url)
show_entries()
before
03ff1a28802bdac830b39571be124d84 None
58d0f2206a6920c2afbc8e1f78b55d1f 023 - Tackling Tragedy (And NetNeutrality)
bb0fc1aa4ba9a3fe5875aa769d81c22b None
after
INFO entry_dedupe: ('http://nodumbqs.libsyn.com/rss', '03ff1a28802bdac830b39571be124d84') (title: '023 - Tackling Tragedy (And NetNeutrality)') duplicates: ['bb0fc1aa4ba9a3fe5875aa769d81c22b', '58d0f2206a6920c2afbc8e1f78b55d1f']
INFO entry_dedupe: set_entry_read(('http://nodumbqs.libsyn.com/rss', '03ff1a28802bdac830b39571be124d84'), True, None)
INFO entry_dedupe: set_tag(('http://nodumbqs.libsyn.com/rss', '03ff1a28802bdac830b39571be124d84'), '.readtime', {'seconds': 34})
INFO entry_dedupe: set_entry_recent_sort(('http://nodumbqs.libsyn.com/rss', '03ff1a28802bdac830b39571be124d84'), datetime.datetime(2018, 1, 10, 5, 8, 39))
INFO entry_dedupe: delete_entries([('http://nodumbqs.libsyn.com/rss', 'bb0fc1aa4ba9a3fe5875aa769d81c22b'), ('http://nodumbqs.libsyn.com/rss', '58d0f2206a6920c2afbc8e1f78b55d1f')])
INFO entry_dedupe: ('http://nodumbqs.libsyn.com/rss', 'bb0fc1aa4ba9a3fe5875aa769d81c22b') (title: '023 - Tackling Tragedy (And NetNeutrality)') duplicates: ['03ff1a28802bdac830b39571be124d84']
Traceback (most recent call last):
  ...
  File "/Users/lemon/code/reader/src/reader/plugins/entry_dedupe.py", line 262, in _after_entry_update
    _dedupe_entries(reader, entry, duplicates, dry_run=dry_run)
  File "/Users/lemon/code/reader/src/reader/plugins/entry_dedupe.py", line 501, in _dedupe_entries
    action()
  File "/Users/lemon/code/reader/src/reader/core.py", line 1338, in set_entry_read
    self._storage.mark_as_read(feed_url, entry_id, bool(read), modified_naive)
  ...
reader.exceptions.EntryNotFoundError: no such entry: ('http://nodumbqs.libsyn.com/rss', 'bb0fc1aa4ba9a3fe5875aa769d81c22b')

Third, adding the feed to an empty database, with only entry_dedupe, fails:

# ... same preamble as before ...
reader.add_feed(feed_url)
reader.update_feed(feed_url)
show_entries()
INFO entry_dedupe: ('http://nodumbqs.libsyn.com/rss', '03ff1a28802bdac830b39571be124d84') (title: '023 - Tackling Tragedy (And NetNeutrality)') duplicates: ['bb0fc1aa4ba9a3fe5875aa769d81c22b', '58d0f2206a6920c2afbc8e1f78b55d1f']
INFO entry_dedupe: set_entry_recent_sort(('http://nodumbqs.libsyn.com/rss', '03ff1a28802bdac830b39571be124d84'), datetime.datetime(2018, 1, 10, 5, 8, 39))
INFO entry_dedupe: delete_entries([('http://nodumbqs.libsyn.com/rss', 'bb0fc1aa4ba9a3fe5875aa769d81c22b'), ('http://nodumbqs.libsyn.com/rss', '58d0f2206a6920c2afbc8e1f78b55d1f')])
INFO entry_dedupe: ('http://nodumbqs.libsyn.com/rss', '58d0f2206a6920c2afbc8e1f78b55d1f') (title: '023 - Tackling Tragedy (And NetNeutrality)') duplicates: ['03ff1a28802bdac830b39571be124d84']
Traceback (most recent call last):
  ...
  File "/Users/lemon/code/reader/src/reader/plugins/entry_dedupe.py", line 262, in _after_entry_update
    _dedupe_entries(reader, entry, duplicates, dry_run=dry_run)
  File "/Users/lemon/code/reader/src/reader/plugins/entry_dedupe.py", line 501, in _dedupe_entries
    action()
  File "/usr/local/Cellar/python@3.10/3.10.2/Frameworks/Python.framework/Versions/3.10/lib/python3.10/contextlib.py", line 79, in inner
    return func(*args, **kwds)
  File "/Users/lemon/code/reader/src/reader/_storage.py", line 710, in set_entry_recent_sort
    rowcount_exactly_one(cursor, lambda: EntryNotFoundError(feed_url, entry_id))
  File "/Users/lemon/code/reader/src/reader/_sqlite_utils.py", line 420, in rowcount_exactly_one
    raise make_exc()
reader.exceptions.EntryNotFoundError: no such entry: ('http://nodumbqs.libsyn.com/rss', '58d0f2206a6920c2afbc8e1f78b55d1f')