Closed joeyrobert closed 4 years ago
It is my understanding that twint
is deliberately slower, since the method of access is less kosher.
Unless you are doing bulk scrapes, the current rate of ingress (10 it/s) should be fine (actually training the model is more of a bottleneck). If it's any slower, then that's an unrelated bug.
I'm not comfortable with aggressive downloading at the moment.
Okay that's understandable, I'll close this ticket.
I tried using this to extract a dataset from a user with 30000+ tweets and found GetOldTweets3 (https://pypi.org/project/GetOldTweets3/) to be much faster -- it managed to extract 11000+ tweets when
download-tweets-ai-text-gen
was around 1200 with a significant headstart. Suggesting this because I think thisdownload-tweets-ai-text-gen
could be sped up, or use GetOldTweets3 as a basis for faster downloading.Appreciate your work regarding GPT-2 and I'm looking forward to training my model. Cheers.