taspinar / twitterscraper

Scrape Twitter for Tweets
MIT License
2.41k stars 578 forks source link

0 results for 1 particular username when using advanced query #331

Open mtnm4tt opened 4 years ago

mtnm4tt commented 4 years ago

When scraping for 1 particular username, TS_SCI_MAJIC12, when using the "from:TS_SCI_MAJIC12" advanced queries, I get 0 results. The same exact queries with any other username is working fine. But when advanced querying for TS_SCI_MAJIC12, the query launches and then it just starts iterating through the 50 retry attempts countdown until it eventually returns 0 tweets. Can you please test and see if you are getting the same results? And if so, do you have any ideas why this isn't working and if there is a potential solution to get this particular users tweet history archive?

For example, this query returns 0 results for this username only, but works fine for any other username I have tested: _twitterscraper "from:TS_SCI_MAJIC12" -bd 2020-04-01 -ed 2020-04-30 -c -o MJ12_tweets_2020040120200430.csv

Twitterscraper example output when querying username TS_SCI_MAJIC12: _INFO:twitterscraper:Retrying... (Attempts left: 50) ... INFO:twitterscraper:Retrying... (Attempts left: 1) ... INFO:twitterscraper:Got 0 tweets for from%3ATS_SCIMAJIC12%20since%3A2020-04-25%20until%3A2020-04-27. INFO:twitterscraper:Got 0 tweets (0 new).

I tried simplifying the query, and this query returns 0 results for this username only also, but works fine for any other username I have tested: _twitterscraper "from:TS_SCI_MAJIC12" -c -o MJ12_tweetsall.csv

This following non-advanced query does work for this particular username, but it only pulls the last ~700 tweets and then stops: _twitterscraper TS_SCI_MAJIC12 --user -c -o MJ12_tweetsuser.csv

Twitterscraper final output with this query: _INFO:twitterscraper:Twitter returned : 'has_more_items' INFO:twitterscraper:Got 721 tweets from username TS_SCIMAJIC12

If there was a way to make this last query example continue until all tweets were scraped, that also would be a great solution, but I have tried using the -a switch and -bd and -ed switches but it never returns more than the last ~700 tweets.

I am using PyCharm CE on a macbook Python 3.7.3 twitterscraper 1.6.0 coala-utils 0.5.0 bs4 0.0.1 lxml 4.5.2 requests 2.24.0 billiard 3.6.3.0

Thank you so much for this great project!

HolzeHan commented 4 years ago

The twitterscraper uses the advanced search, a service of twitter aimed at Human users. It then scrapes the resulting page, thus avoiding the API. Unfortunately for some accounts the advanced search only turns up the account itself. This behavior seems to be kind of random. I Have a similar problem for POTUS. This leads to a page with several accounts containing POTUS, but only if I specify a time range. If I query all times, it works just fine (but because the scraper always uses time windows for the parallelization, it does not work non the less. You could try different queries in the advanced search to find one that is working for your case.