taspinar / twitterscraper

Scrape Twitter for Tweets
MIT License
2.4k stars 581 forks source link

--user tag only return 800 tweet #179

Open patrickcmbooth opened 5 years ago

patrickcmbooth commented 5 years ago

I'm using this command

twitterscraper realDonaldTrump --user --output=tweets15.json

but I can only get 800 tweets every time, when he has definitely more than 800 tweets. I've tried it with another twitter user and it also returned 800 or so tweets only.

sqeu commented 5 years ago

I am also getting the same result. I hope that is not a twitter restriction.

kanihal commented 5 years ago

I am also facing a similar issue. The number of users returned with --user tag is way less than the actual number of tweets.

I scrolled through realDonaldTrump twitter page for a long time, Twitter is only loading ~800 tweets (till Mar 17) and stops loading old tweets thereafter. It seems like a restriction from twitter.

taspinar commented 5 years ago

With this query I got 15000+ tweets and it is still querying... It could be that twitter is indeed blocking some ip addresses or user agents.

Can you try using different user agents HEADERS to see if this allows you to query?

RishengP commented 5 years ago

I am having a similar issue. or worst. I put in 'twitterscraper realDonaldTrump --user --output=tweets15.json' in the command line. No tweet was scrapped. Changing the user name does not seem to work neither

marquisvictor commented 5 years ago

Would be best if you use getoldtweets3 library. [https://github.com/Mottl/GetOldTweets3].

works perfectly for me. I had issues initialising this taspinar library

RishengP commented 5 years ago

@marquisvictor image Hi there, I tried to load getoldtweets3. It says that it is not recognized. Would you kindly help me out?

marquisvictor commented 5 years ago

Yeah i encountered this same issue at first.. Here's what worked for me.

  1. I navigated to the directory/folder of Getoldtweets3.

  2. Once i'm inside the Getoldtweets3 folder, look for another folder called "Bin". open it, you should see a pycache folder, and a Getoldtweets3 file.

  3. Copy just the Getoldtweet3 file, and go back a step to the Getoldtweets folder, Paste it there, and rename it to "Getoldtweets.py"

  4. Then fire up your command prompt in that same folder, i hope you're computer savvy enough to do that,. well, if not, just let me know, i'll be glad to show you how.

  5. after opening up the command prompt, you paste the following code

py -3.6 GetOldTweets3.py --username "barackobama" --since 2015-09-10 --until 2015-09-12 --maxtweets 10

please be wary of the "py -3.6" argument.. If you have set python path as an environment variable such that you can call it from cmd, just do

python GetOldTweets3.py --username "barackobama" --since 2015-09-10 --until 2015-09-12 --maxtweets 10

and that should work just fine. I've been pilling up backdated tweets for the past two weeks now. trying to garner enough big data for my analysis. please let me know if you have any issues going forward.

marquisvictor commented 5 years ago

@RishengP

RishengP commented 5 years ago

@marquisvictor Yay, it works!!!!Thanks man, I really appreciate it. Hey real quick. Do you by any chance know any package can help us pull out info on retweets, such as retweeter id/ retweeter usernames?

marquisvictor commented 5 years ago

@RishengP Nahh, i don't.. You would have to manually code that yourself, in the tweetManager file.

neon-ninja commented 5 years ago

I think the reason GetOldTweets3 returns more in this case is because it's searching the timeline using a from: query instead of going to the user's page. See https://github.com/Mottl/GetOldTweets3/blob/master/GetOldTweets3/manager/TweetManager.py#L142. The equivalent with this repo would be to do something like this instead: twitterscraper from:realDonaldTrump --output=tweets15.json

ZZZzzzyyy commented 4 years ago

With this query I got 15000+ tweets and it is still querying... It could be that twitter is indeed blocking some ip addresses or user agents.

Can you try using different user agents HEADERS to see if this allows you to query?

Hey,Could you please tell me how to deal this issue in detail ? Thanks!

satyamr1 commented 4 years ago

@marquisvictor Yay, it works!!!!Thanks man, I really appreciate it. Hey real quick. Do you by any chance know any package can help us pull out info on retweets, such as retweeter id/ retweeter usernames?

Hey, were you able to get that?

mawic commented 4 years ago

The profile page only shows the last ~800 tweets. That's why the scraper does not scrap more :-/

neon-ninja commented 4 years ago

@mawic try my suggestion above

mawic commented 4 years ago

@neon-ninja: Thanks. It partially helps me. However, it doesn't return the retweets from the user, which I'm looking for. But I assume that there won't be any other solution besides the official API.

neon-ninja commented 4 years ago

@mawic you can use the include:nativeretweets operator to get retweets in search results. For example:

twitterscraper "from:realDonaldTrump include:nativeretweets" --output=tweets15.json -p 1