twintproject / twint

An advanced Twitter scraping & OSINT tool written in Python that doesn't use Twitter's API, allowing you to scrape a user's followers, following, Tweets and more while evading most API limitations.
MIT License
15.86k stars 2.73k forks source link

Twint not providing all results #462

Open mattrocklage opened 5 years ago

mattrocklage commented 5 years ago

Command Ran

twint -s nike -l en --since 2018-08-09 --until 2018-08-11 -o nike.csv --csv --resume nike_resume.txt

Description of Issue

Twint stops after about 769 tweets on August 10, 2018 (22:41:16 UTC). When I do the search on Twitter itself, Twitter also stops listing new tweets at that point. However, when I scroll all the way to the top of the search results on Twitter and then all the way back down to the bottom, Twitter starts to provide additional results. It seems like requesting the same results led Twitter to provide more. So, the problem seems to be with Twitter, but I wonder if Twint could provide an option to try the same result again X number of times after an Y second pause.

Relatedly, simply resuming the results doesn't seem to fix the problem.

Environment Details

Linux 4.14.121 terminal ("Amazon Linux")

pielco11 commented 5 years ago

I wonder if Twint could provide an option to try the same result again X number of times after an Y second pause

Not yet, I need some time to test how Twitter attempts to block requests and etc.

For now you can use the config.Resume option

pushshift commented 5 years ago

I just want to chime in here and offer a workaround for the time being. If you are doing a search on a term and it stops prematurely, you can then do a search for that term and another term that is similar in frequency and you should be able to push past the problem time period. You can then start the new search a day earlier than the day it aborted on using the until command parameter.

In this instance, searching for Nike OR Shoe should work. The downside is you will need to do more post-processing by throwing out comments only related to shoe and not Nike -- but it does help in these situations.

pielco11 commented 5 years ago

@pushshift thank you for the tip, Jason, I follow your work and it's amazing

edsu commented 5 years ago

2 weeks ago I ran a search that returned 5 years of results. This week when I run the exact same search I'm getting 10 days of results. Is Twitter doing some defensive work to block scrapers?

pushshift commented 5 years ago

What was the search?

edsu commented 5 years ago

protectmaunakea OR #kukiaimauna OR #aolemt

BiggerSplash commented 5 years ago

Similar/same issue here. However, the script does not seem exclude specific dates. It just does not retrieve all tweets for a given day.

ghost commented 5 years ago

I'm having the same issue. Twint is stopping 3 months of tweets on an account with FAR more tweets than that.