taspinar / twitterscraper

Scrape Twitter for Tweets
MIT License
2.39k stars 579 forks source link

Is it Possible to get Tweets for all hours of the day? #270

Closed Flyinghans closed 4 years ago

Flyinghans commented 4 years ago

Hello, Is it possible to get tweets for all hours of the day? If i use query_tweets it always start to scrape from 23.59 from each day. I already tried to set the limit high and go day by day to get enough tweets until it reaches 0.00, but he still dont get enough tweets most of the time to get to the start of the day. Also the API stops kinda random at high limits (sometimes just scrapes 1k sometimes 16k etc..) It would be great if anyone got a solution for this problem.

DevAdeola commented 4 years ago

Have you found a solution? I am also experiencing it. It only picks tweets in 23 hrs

Flyinghans commented 4 years ago

not really, i programmed a loop which scraped every 30 min with limit 56000 1 day of tweets. Its pretty random if its able to get the whole day or not.

ghost commented 4 years ago

I'm not sure what you think you're missing. If you set your before and end date to scrape a single day, It scrapes from the beginning of that day (00:00:00) to one second before midnight that same day (23:59:59). Perhaps I've misunderstood - what are you missing?

DevAdeola commented 4 years ago

@jomorrcode, it doesn't spool for the whole 24hrs.

DevAdeola commented 4 years ago

I just tried it with Adele and it returned 3000+ rows from 4th of April to 6th

DevAdeola commented 4 years ago

not really, i programmed a loop which scraped every 30 min with limit 56000 1 day of tweets. Its pretty random if its able to get the whole day or not.

Can you share your code?

ghost commented 4 years ago

It must be intermittent or something. I scraped #nfl with a begin date of March 03 and and end date of March 04 and the earliest tweet was March 03 00:00:00 and the last was March 04 23:59:48.

ghost commented 4 years ago

If you scrape a single day for Adele, what are the earliest and latest datestamps in the returned data?

DevAdeola commented 4 years ago

If you scrape a single day for Adele, what are the earliest and latest datestamps in the returned data?

23:15:49 and 15:46:57 which returned 1670 rows. With the buzz around 'Adele' I expected more

ghost commented 4 years ago

Just querying for "adele" for April 5th, I got 4749 tweets. The earliest is 2020-04-05 00:00:19 and the last one is 2020-04-05 23:59:55. I have no idea how you verify if any are missing.

from twitterscraper import query_tweets

begindate = dt.date(2020,4,5)
enddate = dt.date(2020,4,6)
query='adele'

tweets = query_tweets(query, begindate=begindate, enddate=enddate)
ghost commented 4 years ago

I do agree I've been seeing some oddly inconsistent behaviour.

lapp0 commented 4 years ago

Investigating here https://github.com/taspinar/twitterscraper/issues/311