Closed brianherbert closed 12 years ago
At the moment, the only duplicates that the twitter app filters out are retweets. The app simply grabs tweets via Twitter's streaming API and submits them for semantic and media extraction; it's a pipeline of sorts.
After processing, the drops get posted to the DB via an API endpoint that resides on the web app. At this point we generate an md5
hash for the drop. The hash is performed over a concatenation of the following:
id_str
entity. See https://dev.twitter.com/docs/tweet-entities for more informationThat said, I think what we may have to devise a way for users to define their own duplication filters in addition to simply maintaining a hash of the actual drop content.
Thoughts?
I think if the drop contents are exactly the same (maybe different author) then it should be hidden. Maybe there can be something like a spam folder (duplicate folder?) where an admin could bring individual drops back into the fold. I dunno. Anything to keep a screen full of the same content from showing up would be ideal.
Not a core issue.
It would be nice to group similar drops but we cannot support this we the current backend and ui. Perhaps in a future iteration.
Should Swift be filtering duplicate tweets (from different authors)?
If so, it's not working on UTF-8 tweets.