taspinar / twitterscraper

Scrape Twitter for Tweets
MIT License
2.4k stars 581 forks source link

Weird timestamp patterns #178

Open rss99 opened 5 years ago

rss99 commented 5 years ago

Has anyone noticed that this code only appears to pull tweets from certain times of the day? From my UK timezone, I am only ever seeing tweets just before midnight 12am and around 12 hours after that. So in any timeseries visualisation of tweet volume, it just illustrates big spikes in volume at those specific times, and zero at all other timestamps.

This behaviour cannot be correct. Has anyone looked into this?

Thanks much

taspinar commented 5 years ago

@rss99 This should not be the case. Here you can see an example of an timeline of two separate hashtags.

However, what probably happened is that you have very generic keyword like "bitcoin" or "trump", a large timewindow and possibly a maximum number of tweets/

The timewindow will be split into P different queries (where P is your poolsize), and for each query the tweets will be ordered by datetime. So the search results for tweets posted on 23.59 will appear first and then 23.58, etc.

twitterscraper stops scraping when the maximum number of tweets has been reached, you have some kind of network problems or you are blocked by twitter. So it could be that with a very generic search term, you only have scraped the first few minutes before midnight when twitterscraper stops scraping.

With which query did this problem occur?