Open aholten opened 4 years ago
This sounds like Twitter throttling due to a large volume of requests. Two ways to potentially get around this are 1) stop retrieving tweets for 24+ hours so the Twitter throttling slows down, 2) try collecting tweets using another IP address (e.g., spin up a small AWS instance).
tbh I have not been throttled. There are a large number of ways the scraping can fail unfortunately.
It's been a couple days so I also don't think its throttling, I was trying to do it on my own account which, normally is on private. Perhaps it has to do with my account previously being on private. Thanks
I am also having this problem. Like @antherknee I am running it on my own account which used to be private, but it's been public for some time now.
Although not exactly a solution, if you run the code below, and can not retrieve tweets, the problem is most likely due to the twint library's functionality.
import twint
c = twint.Config()
c.Username = "dan_valinotti"
twint.run.Search(c)
If this code above retrieves tweets, but the code in the repo doesn't, post more details and we can look into it.
Running the code you posted doesn't result in any tweets, but changing twint.run.Search(c)
to twint.run.Profile(c)
seems to work.
Well look at that! You might have found a workaround.
I won't be able to get to fixing this for a bit, but if you have the time, try the following.
In the code for the following pull request (https://github.com/minimaxir/download-tweets-ai-text-gen/pull/24), which fixes some issues not in the Master, change line 107 from:
twint.run.Search(c)
to
twint.run.Profile(c)
You will be using a different scrapping function that I believe is limited to about 3,000 tweets, but you will at least get something.
Not sure if there's any downstream functionality that will break, but if you want to try a potential quick solution, it might be worth your time to try.
Yes, the Profile endpoint is more reliable but the 3200 limit is not suitable for training an AI.
Hey there-
I was able to run download_tweets.py successfully yesterday but today the script is giving an empty CSV. The script always detects the correct number of tweets, and even the first couple times I tried today calculated a reasonable duration estimate. Now it's showing 00:00<? for the estimate. The progress bar sticks to 0% and eventually the script just ends (most recently after 4min4sec.
Is this an issue with running the script within 36 hours of last successful use? I've increased the sleep time to no effect.
Thanks for any help!