taspinar / twitterscraper

Scrape Twitter for Tweets
MIT License
2.4k stars 581 forks source link

Fix early halt of tweet fetching #142

Closed lapp0 closed 5 years ago

lapp0 commented 6 years ago

Fixes https://github.com/taspinar/twitterscraper/issues/126

Instead of giving up when there's no tweets, try again using the min_position returned from the json

lapp0 commented 6 years ago

This does a lot of the work for https://github.com/taspinar/twitterscraper/issues/141

What remains is creating a --resumable argument which generates a scrapejob file, and a --resume scrapejob_filename argument. The scrapejob file will keep track of the pos each process is at, the scrapejob file will be updated each time the pos changes for a query.

The program would also have to have a csv it continuously wrote to https://github.com/taspinar/twitterscraper/issues/136#issuecomment-405702286 which could continue upon --resumeing.

The program would need to be made sure to exit cleanly, ensuring you don't to the next pos before ensuring tweets from current pos are persisted, and an appropraite pos is written to ensure proper file reconstruction.

taspinar commented 6 years ago

After merging PR#140 there is a merge conflict. This should be resolved and the get_query_url() method should be updates so that it can also be used to generate the urls for scraping profiles.