lemon24 / reader

A Python feed reader library.
https://reader.readthedocs.io
BSD 3-Clause "New" or "Revised" License
444 stars 37 forks source link

entry_dedupe flip-flops between entries #340

Closed lemon24 closed 3 months ago

lemon24 commented 3 months ago

entry_dedupe flip-flops between two entries on every update; this seems to be because the duplicates are in the feed itself.

2024-06-16T14:00

('https://qntm.org/rss.php', 'https://qntm.org/630') (title: 'HATETRIS') duplicates: ['https://qntm.org/597']
set_entry_read(('https://qntm.org/rss.php', 'https://qntm.org/630'), True, None)
set_tag(('https://qntm.org/rss.php', 'https://qntm.org/630'), '.readtime', {'seconds': 42})
set_entry_recent_sort(('https://qntm.org/rss.php', 'https://qntm.org/630'), datetime.datetime(2021, 6, 14, 13, 40, 37, tzinfo=datetime.timezone.utc))
delete_entries([('https://qntm.org/rss.php', 'https://qntm.org/597')])
...
0:00:26.096246  115/170 https://qntm.org/rss.php        2 new, 0 modified, 30 total

2024-06-16T14:01

('https://qntm.org/rss.php', 'https://qntm.org/597') (title: 'HATETRIS') duplicates: ['https://qntm.org/630']
set_entry_read(('https://qntm.org/rss.php', 'https://qntm.org/597'), True, None)
set_tag(('https://qntm.org/rss.php', 'https://qntm.org/597'), '.readtime', {'seconds': 42})
set_entry_recent_sort(('https://qntm.org/rss.php', 'https://qntm.org/597'), datetime.datetime(2021, 6, 14, 13, 40, 37, tzinfo=datetime.timezone.utc))
...
0:00:28.967629  115/170 https://qntm.org/rss.php        2 new, 0 modified, 30 total

2024-06-16T15:00

('https://qntm.org/rss.php', 'https://qntm.org/630') (title: 'HATETRIS') duplicates: ['https://qntm.org/597']
set_entry_read(('https://qntm.org/rss.php', 'https://qntm.org/630'), True, None)
set_tag(('https://qntm.org/rss.php', 'https://qntm.org/630'), '.readtime', {'seconds': 42})
set_entry_recent_sort(('https://qntm.org/rss.php', 'https://qntm.org/630'), datetime.datetime(2021, 6, 14, 13, 40, 37, tzinfo=datetime.timezone.utc))
delete_entries([('https://qntm.org/rss.php', 'https://qntm.org/597')])
...
0:00:24.421819  115/170 https://qntm.org/rss.php        2 new, 0 modified, 30 total
lemon24 commented 3 months ago

Related: #292 / https://github.com/lemon24/reader/commit/c7e516fb60cf1b9da5e795c69e6bdf5b1a0f6c24.

lemon24 commented 3 months ago

Hmm... it seems recent sort was not set correctly (new entry shows way up in the feed); need to look more into this.

Update: also, it doesn't stop the flip flopping.

Update #2: it did stop the flip flopping, but update still adds one issue every time (the one the plugin deletes right away); this is expected / unavoidable without some kind of tombstone (https://github.com/lemon24/reader/issues/96#issuecomment-1236304134).

lemon24 commented 3 months ago

The issue fixed by https://github.com/lemon24/reader/commit/fc80a49ee8e94de7876c442ddabf1e5d7aaca847 (and in part, this whole issue), happened because dedupe happens at different points between "on-line" dedupe (after an entry is added/updated), and backfill (on demand, for groups of entries that are all the same). This could arguably be fixed by making "on-line" be more like backfill; related: https://github.com/lemon24/reader/issues/246#issuecomment-1596025920.

Also related, in case we unify the pipelines:

https://github.com/lemon24/reader/blob/fc80a49ee8e94de7876c442ddabf1e5d7aaca847/src/reader/plugins/entry_dedupe.py#L274-L283