twintproject / twint

An advanced Twitter scraping & OSINT tool written in Python that doesn't use Twitter's API, allowing you to scrape a user's followers, following, Tweets and more while evading most API limitations.
MIT License
15.77k stars 2.73k forks source link

[ISSUE] Highly inconsistent results at scale #666

Closed zoink closed 4 years ago

zoink commented 4 years ago

I'm running a few thousand batch compute jobs with twint, getting wildly inconsistent results that don't line up what the online advanced search gives. Multiple runs of the same query give differing lengths of results.

Common error is Expecting value: line 1 column 1 (char 0) [x] run.Feed Seems that Twitter is rate-limiting requests coming from same place?

Not sure how to get around this, might need to write my own script

pielco11 commented 4 years ago

Twint handles the data that Twitter provides

zoink commented 4 years ago

I think this is just another instance of https://github.com/twintproject/twint/issues/604. Once I run into the error if I wait for a few minutes I can start scraping again?

pielco11 commented 4 years ago

I think so, you may want to use different source IPs too

zoink commented 4 years ago

Can you point me in the right direction to set that up?

pielco11 commented 4 years ago

Get a proxy and configure Twint accordingly it's configuration file https://github.com/twintproject/twint/wiki/Configuration