Open GoogleCodeExporter opened 8 years ago
not but it could be an idea!
but in fact, it is really complex since how to determine what is duplicated
entries ?
-if entries come from same feed-> why the dame feed is registered twice ?
-if entries come from different feed-> there is probably a difference in the
content even if subject and same !
Could you provide me any detailed explanations or samples ?
Original comment by ludovic.valente
on 21 Dec 2010 at 6:22
[deleted comment]
Issue 333 has been merged into this issue.
Original comment by ludovic.valente
on 21 Dec 2010 at 6:24
One way to detect duplicate entries is to look for the title of the blog post.
The reason that duplicate entries might appear is because some RSS feeds are
for specifically tagged articles. News sites are a prime example. There might
be a Business feed and a North America feed, and a news article relating to
business in North America would be displayed by both feeds that one is
subscribed to. Or likewise, news from the Health feed might be also found in
the Science feed, so it would appear twice.
Perhaps the best way to remove the duplicates is to preserve the oldest article
and remove or mark as read the later duplicate articles. False alarms seem
unlikely, though they are possible (duplicate names on separate blogs). I have
never come across it.
Example:
7:39 AM ScienceDaily: Earth & Climate News Biofuels from the sea Seaweed may
prove a viable future biofuel, especially if harvested in summer
7:39 AM ScienceDaily: Earth & Climate News Warming ocean layers will undermine
polar ice sheets, climate models show
7:22 AM ScienceDaily: Plants & Animals News Biofuels from the sea Seaweed may
prove a viable future biofuel, especially if harvested in summer
Jul 3, 2011 ScienceDaily: Health & Medicine News Could ovarian stimulation
cause an increase in oocyte chromosome abnormalities? - Ovarian stimulation for
IVF
Original comment by taylorsm...@gmail.com
on 4 Jul 2011 at 3:54
This would be a great feature
Original comment by kevin.me...@gmail.com
on 8 Apr 2012 at 7:25
Technical ways to study :
http://en.wikipedia.org/wiki/Levenshtein_distance
http://code.google.com/p/google-diff-match-patch/
Original comment by ludovic.valente
on 13 Apr 2012 at 1:13
Original comment by ludovic.valente
on 13 Apr 2012 at 1:13
Original issue reported on code.google.com by
JavadA...@gmail.com
on 21 Dec 2010 at 2:42