minimaxir / download-tweets-ai-text-gen

Python script to download public Tweets from a given Twitter account into a format suitable for AI text generation.
MIT License
219 stars 41 forks source link

No tweets retrieved, empty CSV when running #19

Open aholten opened 4 years ago

aholten commented 4 years ago

Hey there-

I was able to run download_tweets.py successfully yesterday but today the script is giving an empty CSV. The script always detects the correct number of tweets, and even the first couple times I tried today calculated a reasonable duration estimate. Now it's showing 00:00<? for the estimate. The progress bar sticks to 0% and eventually the script just ends (most recently after 4min4sec.

Is this an issue with running the script within 36 hours of last successful use? I've increased the sleep time to no effect.

Thanks for any help!

sdelgadoc commented 4 years ago

This sounds like Twitter throttling due to a large volume of requests. Two ways to potentially get around this are 1) stop retrieving tweets for 24+ hours so the Twitter throttling slows down, 2) try collecting tweets using another IP address (e.g., spin up a small AWS instance).

minimaxir commented 4 years ago

tbh I have not been throttled. There are a large number of ways the scraping can fail unfortunately.

aholten commented 4 years ago

It's been a couple days so I also don't think its throttling, I was trying to do it on my own account which, normally is on private. Perhaps it has to do with my account previously being on private. Thanks

dan-valinotti commented 4 years ago

I am also having this problem. Like @antherknee I am running it on my own account which used to be private, but it's been public for some time now.

sdelgadoc commented 4 years ago

Although not exactly a solution, if you run the code below, and can not retrieve tweets, the problem is most likely due to the twint library's functionality.

import twint

c = twint.Config()
c.Username = "dan_valinotti"

twint.run.Search(c)

If this code above retrieves tweets, but the code in the repo doesn't, post more details and we can look into it.

dan-valinotti commented 4 years ago

Running the code you posted doesn't result in any tweets, but changing twint.run.Search(c) to twint.run.Profile(c) seems to work.

sdelgadoc commented 4 years ago

Well look at that! You might have found a workaround.

I won't be able to get to fixing this for a bit, but if you have the time, try the following.

In the code for the following pull request (https://github.com/minimaxir/download-tweets-ai-text-gen/pull/24), which fixes some issues not in the Master, change line 107 from:

twint.run.Search(c)

to

twint.run.Profile(c)

You will be using a different scrapping function that I believe is limited to about 3,000 tweets, but you will at least get something.

Not sure if there's any downstream functionality that will break, but if you want to try a potential quick solution, it might be worth your time to try.

minimaxir commented 4 years ago

Yes, the Profile endpoint is more reliable but the 3200 limit is not suitable for training an AI.