minimaxir / download-tweets-ai-text-gen

Python script to download public Tweets from a given Twitter account into a format suitable for AI text generation.
MIT License
221 stars 41 forks source link

Much slower than other libraries (e.g. GetOldTweets3) #6

Closed joeyrobert closed 4 years ago

joeyrobert commented 4 years ago

I tried using this to extract a dataset from a user with 30000+ tweets and found GetOldTweets3 (https://pypi.org/project/GetOldTweets3/) to be much faster -- it managed to extract 11000+ tweets when download-tweets-ai-text-gen was around 1200 with a significant headstart. Suggesting this because I think this download-tweets-ai-text-gen could be sped up, or use GetOldTweets3 as a basis for faster downloading.

Appreciate your work regarding GPT-2 and I'm looking forward to training my model. Cheers.

minimaxir commented 4 years ago

It is my understanding that twint is deliberately slower, since the method of access is less kosher.

Unless you are doing bulk scrapes, the current rate of ingress (10 it/s) should be fine (actually training the model is more of a bottleneck). If it's any slower, then that's an unrelated bug.

I'm not comfortable with aggressive downloading at the moment.

joeyrobert commented 4 years ago

Okay that's understandable, I'll close this ticket.