Closed lapp0 closed 5 years ago
This does a lot of the work for https://github.com/taspinar/twitterscraper/issues/141
What remains is creating a --resumable
argument which generates a scrapejob
file, and a --resume scrapejob_filename
argument. The scrapejob
file will keep track of the pos
each process is at, the scrapejob
file will be updated each time the pos changes for a query.
The program would also have to have a csv it continuously wrote to https://github.com/taspinar/twitterscraper/issues/136#issuecomment-405702286 which could continue upon --resume
ing.
The program would need to be made sure to exit cleanly, ensuring you don't to the next pos before ensuring tweets from current pos are persisted, and an appropraite pos is written to ensure proper file reconstruction.
After merging PR#140 there is a merge conflict. This should be resolved and the get_query_url() method should be updates so that it can also be used to generate the urls for scraping profiles.
Fixes https://github.com/taspinar/twitterscraper/issues/126
Instead of giving up when there's no tweets, try again using the min_position returned from the json