graulund / tweetnest

NO LONGER MAINTAINED. MESSAGE ME IF YOU WANT TO MAINTAIN THIS. :) A browsable, searchable and easily customizable archive and backup for your tweets
MIT License
492 stars 93 forks source link

Duplicate Tweets? #55

Closed navjotjsingh closed 11 years ago

navjotjsingh commented 11 years ago

This API change has messed up my archive. I am yet to upgrade because my old archive now has some 10000+ odd duplicate tweets. Some tweets ended up getting posted even 5 times. Example - http://tweets.nspeaks.com/2013/05/31 This started sometime in april. I should have stopped syncing but I detected it late. How to do away with all these duplicate tweets before I upgrade the archive?

graulund commented 11 years ago

Since a notable part of the upgrade process was to eliminate tweet duplicates by upgrading the SQL tweets table, making sure that the tweet id column is always unique, I am lead to believe this upgrade hasn't progressed as it should have. Please double check that there is a UNIQUE index on column tweetid in table tweets in the database. Thanks!

navjotjsingh commented 11 years ago

These are the currently listed indexes on the table. No unique index on the tweetid column. What should I do? Authorize.php found only 1 duplicate tweet while upgrading. I still have 12741 duplicate tweets left in the database.

2013-07-05_09-38-44

graulund commented 11 years ago

The most easy option is definitely to start over and install from scratch, and get your tweets from a Twitter archive. You could remove every duplicate manually, but that would obviously be tedious.

navjotjsingh commented 11 years ago

I had that in my mind too but for some reason even that was failing[out of memory errors]. Anyways managed to get rid of the duplicates thanks to the trick mentioned here: http://www.devcha.com/2008/03/how-to-remove-duplicate-records-with.html

Thanks for the help.

graulund commented 11 years ago

Now that you have no duplicate tweets, run upgrade.php to make sure the tables are as they should be. :)