kalleth / mpukviewr

Daemon and Rails application for viewr.
Other
5 stars 0 forks source link

Prevent duplicate RT's appearing in feed #7

Open kalleth opened 12 years ago

kalleth commented 12 years ago

Frequently MPLAY_Wizzo would RT an iseries tweet, or vice versa.

These should be intelligently detected so that the same tweet doesn't appear on the feed twice.

kalleth commented 12 years ago

I have no idea how I'm going to detect this. Special-casing?

I want to implement some kind of fuzzy-dupe-checker that will produce a 'match %' - i.e. "96% of this message is the same as a previous message". Requires some DB wizardry, i reckon this ticket will be sorted whenever i do that.

unspec commented 12 years ago

Your method would probably be the most useful for reducing duplicate content generally, but as a short term solution I believe the twitter api returns some additional fields with a retweeted tweet that includes the user id of the source account of the original tweet. You could check all retweets and see if the originator is one of the other accounts viewr is monitoring. Won't catch all possible twitter dupes (e.g both MPLAY_Wizzo & iseries retweeting the same tweet from some 3rd party) but might help.

kalleth commented 12 years ago

Oooh. Nice. The twitter ID is used as the 'guid' of the event in teh db, so before creating a tweet 'event' I could just check if the 'rt-of' field (if present) to make sure that guid isn't already in there. Good suggestion!

unspec commented 12 years ago

Theoretically if you add another field to the db for the ID of the retweeted tweet (as oppose to the id of the retweet which will be the guid), if any for that particular entry, then you could also check that for each retweet detected. That should stop the "both MPLAY_Wizzo & iseries retweeting the same tweet from some 3rd party" issue as both of those retweets, whilst having different ids themselves, should contain the same retweeted tweet ID inside.

The word 'tweet' has now lost all meaning to me.

kalleth commented 12 years ago

so 'guid' (already exists), 'retweeted_tweetid'

when obtaining a tweet:

Hear what you say about the word 'tweet' now though