nkanaev / yarr

yet another rss reader
MIT License
3.02k stars 225 forks source link

Duplicate items appearing #67

Closed thedeany closed 3 years ago

thedeany commented 3 years ago

I added Merriam-Webster's Word of the Day feed to my yarr instance and upon doing so it added the list of words from the past few days just fine. Upon fetching new updates now, the new word of the day is appearing twice in yarr even though it only appears once in the feed.

Feed URL: https://www.merriam-webster.com/wotd/feed/rss2

image

Notice the two most recent words, progeny and inveigle, are duplicated. (I added the feed two days ago, when abrupt was the word of the day.)

I looked at the DB and do in fact see two rows for each of those words. In the feed, I Ctrl+F'd the GUIDs for each row and, as expected, only one for each word is found. I'm not sure where the duplicate rows/different GUIDs are coming from. Is there anything I can do to troubleshoot this further?

thedeany commented 3 years ago

Just an update: today's word only appears once in yarr. Let me know if there's any logs or DB information I can provide.

nkanaev commented 3 years ago

I'm guessing it may be the issue on the provider side. It is highly likely that the update's unique ids (guids) changed, resulting in duplicate entries. To check that you could run:

sqlite3 /path/to/yarr.db -header 'select guid, title, date, date_arrived from items where feed_id in (select id from feeds where title like "Merriam%") order by date_arrived' | column -s '|' -t

If that's the case, you'll see that the duplicate words' guid & date_arrived will be different.

thedeany commented 3 years ago

@nkanaev Thanks for the pointer. I ran that and do in fact see different guids and date_arriveds. So I guess Merriam-Webster is just bad at RSS feeds? :)

I'll close this as I concur it's a provider issue. Thanks again!