Closed AndyM48 closed 1 week ago
No, I believe it isn't that simple. Please check out the complexity of the item comparison code in src/itemset.c there is already a lot of logic eliminating duplication.
The BBC feed in question provides unique identifiers for feed items, if those are present a difference in those is taken as an indication of different items. If such a feed provider issues the same content with a new UID the RSS spec says it is to be considered new content.
There are use cases where you want it and your suggestion would kill the use case. For example an feed alerting on something and providing the same content at different times to show you that a problem does persist.
Thank you for the explanation. I understand what you have said. Could there be an option, or maybe a plugin, to hide "apparent" duplicates, ie. ignore the UID when displaying the feeds?
Such an option would be possible. Maintaining the feature is the problem. This is a one man project, all code paths that the maintainer does not use daily tend to rot :-(
This is really very frustrating. Many, many feeds have apparently duplicated items, especially from the BBC. The only difference in the sql database (items) seems to be in the source_id where a number is appended to the string eg:
https://www.bbc.com/sport/football/videos/cx88ezex0jzo#5
https://www.bbc.com/sport/football/videos/cx88ezex0jzo#6
Are the the "unique identifiers " you referred to above?
There is an informative article here
A picture is worth a thousand words:
These are from the BBC Feed (http://feeds.bbci.co.uk/news/rss.xml). Isn't it just a question of comparing titles and times?