Open alin-simionoiuDE opened 3 years ago
NewsBlur already tries very hard to remove duplicate articles automatically. Unfortunately they still come through, so let's use this ticket as a reminder to double check the dupe checker and to write tests for it to ensure it works as we expect it to. I see no reason to have this as a preference. Duplicates should be (and are) filtered out automatically.
Here's the code that does the work:
https://github.com/samuelclay/NewsBlur/blob/master/apps/rss_feeds/models.py#L2014-L2117
Oh, you mean from unrelated feeds in the folder? That's a different kind of check, one that I'm working on now using ElasticSearch's MoreLikeThis query.
Yes! From unrelated feeds in a folder
I like the idea very much. It is similar to what Google News is doing: Cluster the news for a specific event / topic across different sources. See screenshot:
One idea I had about this: I guess it would already help, if Newsblur could cluster by the trained tags. E.g. I train the word "Amazon" and it clusters everything that has Amazon in the title - independent of the source
I love NewsBlur, been a paying customer for a while now. the one feature that I would love to have is removing duplicate articles at the folder level. preferably some sort of checkbox when I open "folder settings" maybe
I keep my feeds in folders, and I always find duplicated articles inside the folder. Not surprising really, if there's a good subject out there multiple feeds are going to have articles about it.
Looks like inoreader has this feature.