ushahidi / SwiftRiver-Core

SwiftRiver Core Applications
6 stars 3 forks source link

Duplicate Filtering #15

Closed brianherbert closed 12 years ago

brianherbert commented 12 years ago

Should Swift be filtering duplicate tweets (from different authors)?

If so, it's not working on UTF-8 tweets.

ekala commented 12 years ago

At the moment, the only duplicates that the twitter app filters out are retweets. The app simply grabs tweets via Twitter's streaming API and submits them for semantic and media extraction; it's a pipeline of sorts.

After processing, the drops get posted to the DB via an API endpoint that resides on the web app. At this point we generate an md5 hash for the drop. The hash is performed over a concatenation of the following:

That said, I think what we may have to devise a way for users to define their own duplication filters in addition to simply maintaining a hash of the actual drop content.

Thoughts?

brianherbert commented 12 years ago

I think if the drop contents are exactly the same (maybe different author) then it should be hidden. Maybe there can be something like a spam folder (duplicate folder?) where an admin could bring individual drops back into the fold. I dunno. Anything to keep a screen full of the same content from showing up would be ideal.

69mb commented 12 years ago

Not a core issue.

It would be nice to group similar drops but we cannot support this we the current backend and ui. Perhaps in a future iteration.