minimaxir / download-tweets-ai-text-gen

Python script to download public Tweets from a given Twitter account into a format suitable for AI text generation.
MIT License
221 stars 41 forks source link

Tweet download ends too early when there are plainly more tweets available #2

Closed minimaxir closed 4 years ago

minimaxir commented 4 years ago

The datetime output at the end of the query makes this evident when it occurs.

Most likely a twint issue but need to see if there is a workaround for specific use cases.

minimaxir commented 4 years ago

It appears it may be somewhat due to time/IP? Accounts that were working yesterday for me are not working now (die after ~100 tweets)

minimaxir commented 4 years ago

It's def an issue with twint: running the base twint command w/ no special parameters causes the issue as well.

minimaxir commented 4 years ago

After testing, the implementation in 3106f72d5a8cdc1c300a1b1fcd265fee3be30708 avoids this issue, although at a slight performance/code readability cost. Will have to see if there is a better implementation.

mmolaro commented 4 years ago

Twitter may have added some more antiscraping methods/throttling. I can't seem to get more than 5-700 KB of data. Increasing the sleep to 60s isn't enough either.

diwakergupta commented 4 years ago

Same here, download consistently stops after downloading ~200-KB worth of tweets. Exit code is 0 though, so it doesn't seem like the script is erroring out.

minimaxir commented 4 years ago

To clarify, this is with twint=2.1.4 correct? This issue happened with more recent versions.

mmolaro commented 4 years ago

Yes with twint=2.1.4 looking at twint repo issues seems like the issue is there across a range of versions.

diwakergupta commented 4 years ago

FWIW, using the twint cli directly I was able to get twice as many tweets (~7000 vs. ~14000). 🤷‍♂