taspinar / twitterscraper

Scrape Twitter for Tweets
MIT License
2.39k stars 581 forks source link

20 Tweets per sub-period #309

Open romit-actuarial opened 4 years ago

romit-actuarial commented 4 years ago

After multiple runs, the algorithm just generated 20 tweets per sub-period, typically for 20 sub periods (which seems to be the default subdivision) yields 400 tweets (approx), so for any keyword I'm not able to get more than 400 tweets, no matter what the keyword or the duration is. :(

Here is my code:

from twitterscraper import query_tweets import datetime as dt import pandas as pd from pandas import ExcelWriter

begin_date = dt.date(2019,3,31) end_date = dt.date(2020,4, 1) d= {} handle_list = ["@pnbindia", "@AxisBankSupport", "@AxisBank", "@YESBANK", "@ICICIBank_Care", "ICICI Bank", "@UPI_NPCI", "@NPCI_NPCI", "@NPCI_BHIM", "@RuPay_npci", "@bankofbaroda", "@canarabank", "@UnionBankTweets", "@KotakBankLtd", "@KotakBankLtd", "@HDFC_Bank", "TheOfficialSBI", "@SBICardConnect", "@Paytm", "@PhonePe","@GooglePay","@amazonpay", "@PaytmBank", "@phonepe_safety", "@PhonePeSupport"] for handle in handle_list:
tweets = query_tweets(handle, begindate = begin_date, enddate = end_date, lang = "english" ) d[handle] = pd.DataFrame(t.dict for t in tweets)

And here's what the output looks like for all the keywords:

INFO: queries: ['@pnbindia since:2019-03-31 until:2019-04-18', '@pnbindia since:2019-04-18 until:2019-05-06', '@pnbindia since:2019-05-06 until:2019-05-25', '@pnbindia since:2019-05-25 until:2019-06-12', '@pnbindia since:2019-06-12 until:2019-06-30', '@pnbindia since:2019-06-30 until:2019-07-19', '@pnbindia since:2019-07-19 until:2019-08-06', '@pnbindia since:2019-08-06 until:2019-08-24', '@pnbindia since:2019-08-24 until:2019-09-12', '@pnbindia since:2019-09-12 until:2019-09-30', '@pnbindia since:2019-09-30 until:2019-10-18', '@pnbindia since:2019-10-18 until:2019-11-06', '@pnbindia since:2019-11-06 until:2019-11-24', '@pnbindia since:2019-11-24 until:2019-12-12', '@pnbindia since:2019-12-12 until:2019-12-31', '@pnbindia since:2019-12-31 until:2020-01-18', '@pnbindia since:2020-01-18 until:2020-02-05', '@pnbindia since:2020-02-05 until:2020-02-24', '@pnbindia since:2020-02-24 until:2020-03-13', '@pnbindia since:2020-03-13 until:2020-04-01'] INFO: Got 20 tweets (20 new). INFO: Got 38 tweets (18 new). INFO: Got 58 tweets (20 new). INFO: Got 78 tweets (20 new). INFO: Got 98 tweets (20 new). INFO: Got 118 tweets (20 new). INFO: Got 138 tweets (20 new). INFO: Got 158 tweets (20 new). INFO: Got 176 tweets (18 new). INFO: Got 196 tweets (20 new). INFO: Got 216 tweets (20 new). INFO: Got 235 tweets (19 new). INFO: Got 255 tweets (20 new). INFO: Got 268 tweets (13 new). INFO: Got 287 tweets (19 new). INFO: Got 307 tweets (20 new). INFO: Got 326 tweets (19 new). INFO: Got 345 tweets (19 new). INFO: Got 364 tweets (19 new). INFO: Got 383 tweets (19 new).

Ideally there should be thousands of results, PS: I'm using Spyder as the IDE

Help me!

lapp0 commented 4 years ago

I got 13434 tweets

twitterscraper "@pnbindia" --lang en --output test55.json -bd 2019-03-31 -ed 2020-04-01
INFO: {'User-Agent': 'Mozilla/5.0 (compatible, MSIE 11, Windows NT 6.3; Trident/7.0; rv:11.0) like Gecko', 'X-Requested-With': 'XMLHttpRequest'}
INFO: queries: ['@pnbindia since:2019-03-31 until:2019-04-18', '@pnbindia since:2019-04-18 until:2019-05-06', '@pnbindia since:2019-05-06 until:2019-05-25', '@pnbindia since:2019-05-25 until:2019-06-12', '@pnbindia since:2019-06-12 until:2019-06-30', '@pnbindia since:2019-06-30 until:2019-07-19', '@pnbindia since:2019-07-19 until:2019-08-06', '@pnbindia since:2019-08-06 until:2019-08-24', '@pnbindia since:2019-08-24 until:2019-09-12', '@pnbindia since:2019-09-12 until:2019-09-30', '@pnbindia since:2019-09-30 until:2019-10-18', '@pnbindia since:2019-10-18 until:2019-11-06', '@pnbindia since:2019-11-06 until:2019-11-24', '@pnbindia since:2019-11-24 until:2019-12-12', '@pnbindia since:2019-12-12 until:2019-12-31', '@pnbindia since:2019-12-31 until:2020-01-18', '@pnbindia since:2020-01-18 until:2020-02-05', '@pnbindia since:2020-02-05 until:2020-02-24', '@pnbindia since:2020-02-24 until:2020-03-13', '@pnbindia since:2020-03-13 until:2020-04-01']

...

INFO: Got 686 tweets for @pnbindia%20since%3A2019-12-31%20until%3A2020-01-18.
INFO: Got 12129 tweets (686 new).
ERROR: An unknown error occurred! Returning tweets gathered so far.
Traceback (most recent call last):
  File "/home/andrew/p/twitterscraper/.venv/lib/python3.7/site-packages/twitterscraper-1.4.0-py3.7.egg/twitterscraper/query.py", line 173, in query_tweets_once_generator
    new_tweets, new_pos = query_single_page(query, lang, pos)
  File "/home/andrew/p/twitterscraper/.venv/lib/python3.7/site-packages/twitterscraper-1.4.0-py3.7.egg/twitterscraper/query.py", line 100, in query_single_page
    html = json_resp['items_html'] or ''
KeyError: 'items_html'
INFO: Got 653 tweets for @pnbindia%20since%3A2019-11-24%20until%3A2019-12-12.
INFO: Got 12782 tweets (653 new).
ERROR: An unknown error occurred! Returning tweets gathered so far.
Traceback (most recent call last):
  File "/home/andrew/p/twitterscraper/.venv/lib/python3.7/site-packages/twitterscraper-1.4.0-py3.7.egg/twitterscraper/query.py", line 173, in query_tweets_once_generator
    new_tweets, new_pos = query_single_page(query, lang, pos)
  File "/home/andrew/p/twitterscraper/.venv/lib/python3.7/site-packages/twitterscraper-1.4.0-py3.7.egg/twitterscraper/query.py", line 100, in query_single_page
    html = json_resp['items_html'] or ''
KeyError: 'items_html'
INFO: Got 652 tweets for @pnbindia%20since%3A2019-08-06%20until%3A2019-08-24.
INFO: Got 13434 tweets (652 new).

I'll investigate why users are getting a variable number / incorrect number of tweets sometimes here https://github.com/taspinar/twitterscraper/issues/311

romit-actuarial commented 4 years ago

Hi @lapp0 So, It's like I'm getting variable number of tweets as well, the first time I run the code I get 14141 tweets, and then subsequently every query yields only 20 tweets per sub period.

look at this; this is the output of my loop - notice the difference between the first query and all the subsequent ones.

INFO: {'User-Agent': 'Mozilla/5.0 (Windows; U; Windows NT 6.1; x64; fr; rv:1.9.2.13) Gecko/20101203 Firebird/3.6.13', 'X-Requested-With': 'XMLHttpRequest'} INFO: queries: ['@pnbindia since:2019-03-31 until:2019-04-18', '@pnbindia since:2019-04-18 until:2019-05-06', '@pnbindia since:2019-05-06 until:2019-05-25', '@pnbindia since:2019-05-25 until:2019-06-12', '@pnbindia since:2019-06-12 until:2019-06-30', '@pnbindia since:2019-06-30 until:2019-07-19', '@pnbindia since:2019-07-19 until:2019-08-06', '@pnbindia since:2019-08-06 until:2019-08-24', '@pnbindia since:2019-08-24 until:2019-09-12', '@pnbindia since:2019-09-12 until:2019-09-30', '@pnbindia since:2019-09-30 until:2019-10-18', '@pnbindia since:2019-10-18 until:2019-11-06', '@pnbindia since:2019-11-06 until:2019-11-24', '@pnbindia since:2019-11-24 until:2019-12-12', '@pnbindia since:2019-12-12 until:2019-12-31', '@pnbindia since:2019-12-31 until:2020-01-18', '@pnbindia since:2020-01-18 until:2020-02-05', '@pnbindia since:2020-02-05 until:2020-02-24', '@pnbindia since:2020-02-24 until:2020-03-13', '@pnbindia since:2020-03-13 until:2020-04-01'] INFO: Got 382 tweets (382 new). INFO: Got 1155 tweets (773 new). INFO: Got 1851 tweets (696 new). INFO: Got 2562 tweets (711 new). INFO: Got 3272 tweets (710 new). INFO: Got 4071 tweets (799 new). INFO: Got 4727 tweets (656 new). INFO: Got 5510 tweets (783 new). INFO: Got 6181 tweets (671 new). INFO: Got 6985 tweets (804 new). INFO: Got 7671 tweets (686 new). INFO: Got 8403 tweets (732 new). INFO: Got 9069 tweets (666 new). INFO: Got 9825 tweets (756 new). INFO: Got 10529 tweets (704 new). INFO: Got 11247 tweets (718 new). INFO: Got 12010 tweets (763 new). INFO: Got 12727 tweets (717 new). INFO: Got 13448 tweets (721 new). INFO: Got 14141 tweets (693 new). INFO: queries: ['@AxisBankSupport since:2019-03-31 until:2019-04-18', '@AxisBankSupport since:2019-04-18 until:2019-05-06', '@AxisBankSupport since:2019-05-06 until:2019-05-25', '@AxisBankSupport since:2019-05-25 until:2019-06-12', '@AxisBankSupport since:2019-06-12 until:2019-06-30', '@AxisBankSupport since:2019-06-30 until:2019-07-19', '@AxisBankSupport since:2019-07-19 until:2019-08-06', '@AxisBankSupport since:2019-08-06 until:2019-08-24', '@AxisBankSupport since:2019-08-24 until:2019-09-12', '@AxisBankSupport since:2019-09-12 until:2019-09-30', '@AxisBankSupport since:2019-09-30 until:2019-10-18', '@AxisBankSupport since:2019-10-18 until:2019-11-06', '@AxisBankSupport since:2019-11-06 until:2019-11-24', '@AxisBankSupport since:2019-11-24 until:2019-12-12', '@AxisBankSupport since:2019-12-12 until:2019-12-31', '@AxisBankSupport since:2019-12-31 until:2020-01-18', '@AxisBankSupport since:2020-01-18 until:2020-02-05', '@AxisBankSupport since:2020-02-05 until:2020-02-24', '@AxisBankSupport since:2020-02-24 until:2020-03-13', '@AxisBankSupport since:2020-03-13 until:2020-04-01'] INFO: Got 18 tweets (18 new). INFO: Got 37 tweets (19 new). INFO: Got 56 tweets (19 new). INFO: Got 74 tweets (18 new). INFO: Got 93 tweets (19 new). INFO: Got 112 tweets (19 new). INFO: Got 131 tweets (19 new). INFO: Got 148 tweets (17 new). INFO: Got 168 tweets (20 new). INFO: Got 188 tweets (20 new). INFO: Got 208 tweets (20 new). INFO: Got 228 tweets (20 new). INFO: Got 248 tweets (20 new). INFO: Got 267 tweets (19 new). INFO: Got 286 tweets (19 new). INFO: Got 305 tweets (19 new). INFO: Got 322 tweets (17 new). INFO: Got 340 tweets (18 new). INFO: Got 360 tweets (20 new). INFO: Got 379 tweets (19 new). INFO: queries: ['@AxisBank since:2019-03-31 until:2019-04-18', '@AxisBank since:2019-04-18 until:2019-05-06', '@AxisBank since:2019-05-06 until:2019-05-25', '@AxisBank since:2019-05-25 until:2019-06-12', '@AxisBank since:2019-06-12 until:2019-06-30', '@AxisBank since:2019-06-30 until:2019-07-19', '@AxisBank since:2019-07-19 until:2019-08-06', '@AxisBank since:2019-08-06 until:2019-08-24', '@AxisBank since:2019-08-24 until:2019-09-12', '@AxisBank since:2019-09-12 until:2019-09-30', '@AxisBank since:2019-09-30 until:2019-10-18', '@AxisBank since:2019-10-18 until:2019-11-06', '@AxisBank since:2019-11-06 until:2019-11-24', '@AxisBank since:2019-11-24 until:2019-12-12', '@AxisBank since:2019-12-12 until:2019-12-31', '@AxisBank since:2019-12-31 until:2020-01-18', '@AxisBank since:2020-01-18 until:2020-02-05', '@AxisBank since:2020-02-05 until:2020-02-24', '@AxisBank since:2020-02-24 until:2020-03-13', '@AxisBank since:2020-03-13 until:2020-04-01'] INFO: Got 18 tweets (18 new). INFO: Got 38 tweets (20 new). INFO: Got 56 tweets (18 new). INFO: Got 67 tweets (11 new). INFO: Got 84 tweets (17 new). INFO: Got 100 tweets (16 new). INFO: Got 119 tweets (19 new). INFO: Got 138 tweets (19 new). INFO: Got 154 tweets (16 new). INFO: Got 173 tweets (19 new). INFO: Got 192 tweets (19 new). INFO: Got 212 tweets (20 new). INFO: Got 227 tweets (15 new). INFO: Got 247 tweets (20 new). INFO: Got 265 tweets (18 new). INFO: Got 283 tweets (18 new). INFO: Got 303 tweets (20 new). INFO: Got 320 tweets (17 new). INFO: Got 339 tweets (19 new). INFO: Got 359 tweets (20 new). INFO: queries: ['@YESBANK since:2019-03-31 until:2019-04-18', '@YESBANK since:2019-04-18 until:2019-05-06', '@YESBANK since:2019-05-06 until:2019-05-25', '@YESBANK since:2019-05-25 until:2019-06-12', '@YESBANK since:2019-06-12 until:2019-06-30', '@YESBANK since:2019-06-30 until:2019-07-19', '@YESBANK since:2019-07-19 until:2019-08-06', '@YESBANK since:2019-08-06 until:2019-08-24', '@YESBANK since:2019-08-24 until:2019-09-12', '@YESBANK since:2019-09-12 until:2019-09-30', '@YESBANK since:2019-09-30 until:2019-10-18', '@YESBANK since:2019-10-18 until:2019-11-06', '@YESBANK since:2019-11-06 until:2019-11-24', '@YESBANK since:2019-11-24 until:2019-12-12', '@YESBANK since:2019-12-12 until:2019-12-31', '@YESBANK since:2019-12-31 until:2020-01-18', '@YESBANK since:2020-01-18 until:2020-02-05', '@YESBANK since:2020-02-05 until:2020-02-24', '@YESBANK since:2020-02-24 until:2020-03-13', '@YESBANK since:2020-03-13 until:2020-04-01'] INFO: Got 19 tweets (19 new). INFO: Got 38 tweets (19 new). INFO: Got 57 tweets (19 new). INFO: Got 77 tweets (20 new). INFO: Got 96 tweets (19 new). INFO: Got 116 tweets (20 new). INFO: Got 136 tweets (20 new). INFO: Got 154 tweets (18 new). INFO: Got 174 tweets (20 new). INFO: Got 194 tweets (20 new). INFO: Got 214 tweets (20 new). INFO: Got 233 tweets (19 new). INFO: Got 253 tweets (20 new). INFO: Got 273 tweets (20 new). INFO: Got 293 tweets (20 new). INFO: Got 312 tweets (19 new). INFO: Got 328 tweets (16 new). INFO: Got 347 tweets (19 new). INFO: Got 367 tweets (20 new). INFO: Got 386 tweets (19 new).

All the ones after this are also similar.

AbdullaRifai commented 4 years ago

Was just wondering was their a way to fix this issue, as I am in need of this data as soon as possible.