Open dafunction opened 3 years ago
I'm glad to hear that the code is working with you! As a developer, one always wonders if all those clones are leading to use, or to cursing when things don't work.
You are right that the code, as is, can't limit the collected tweets by date. You also correctly identified the part of the code that could be modified to limit collected tweets by date.
However, it will require adding another parameter to the tweepy.Cursor
call named toDate
. The code collects tweets from most recent to oldest, so if you change the fromDate
, it will still start collecting from the newest tweet versus from where you ended.
To get the behavior you're looking for, you need to add the toDate
parameter as shown below, and set it to the date of the last tweet you collected.
if limit is not None:
cursor = tweepy.Cursor(api.search_full_archive,
environment_name=environment_name,
query = "from:" + username,
fromDate="200603220000",
toDate="[DATE_OF_LAST_TWEET]").items(limit)
else:
cursor = tweepy.Cursor(api.search_full_archive,
environment_name=environment_name,
query = "from:" + username,
fromDate="200603220000",
toDate="[DATE_OF_LAST_TWEET]").items()
Although the answer above should resolve your issue, if you're feeling generous, you could make the toDate
value a parameter to the code by sending it up the download_account_tweets
and download_tweets
functions. :-) If you do, please submit a pull request; I'd be happy to add it to the code base.
Hello!
Firstly, thanks for sharing the code for this tool! Using this as a first project to play with gpt2 and machine learning.
This "issue" is actually more of a question, but as you mentioned in the README Twitter's free tier has a collection limit of 5,000. Rather than paying for the premium tier while I'm doing this project for educational purposes only, I'm hoping to wait until my collection limit reset next month so I can collect more tweets from a particular user to train the model. In my case, I collected 100 tweets as a test, then collected 4,900 after the test was successful.
Getting to the question - is it possible to skip the block of 4,900 tweets I'll have collected and collect the next block of tweets within my collection limit once it's reset using the script as is? Scanning over it there doesn't appear to be any params defined to do so.
Perhaps bumping the
fromDate
from lines 123-132 up to the date of whatever the last tweet collected from the first block is? There will probably be some overlap but I'd guess that would work okay.Thanks in advance for your response.