taspinar / twitterscraper

Scrape Twitter for Tweets
MIT License
2.4k stars 581 forks source link

no results... #115

Closed usajameskwon closed 5 years ago

usajameskwon commented 6 years ago

ERROR: Failed to parse JSON "Expecting value: line 1 column 1 (char 0)" while requesting "https://twitter.com/i/search/timeline?f=tweets&vertical=default&include_available_features=1&include_entities=1&reset_error_state=false&src=typd&max_position=TWEET-838177224989753344-838177234682773505&q=trump%20since%3A2016-07-25%20until%3A2017-03-05&l=None"

I don't know why suddenly I'm getting into this problem.

lapp0 commented 6 years ago

I'm getting this problem and I was not getting this problem previously as well. Currently looking into it, twitter probably introduced changes that breaks this. I'm aware of other software which twitter's recent changes broke as well.

Full traceback:

May 24 03:41:32 harvester harvester[1648]: ERROR:root:Failed to parse JSON "Expecting value: line 1 column 1 (char 0)" while requesting "https://twitter.com/i/search/timeline?f=tweets&vertical=default&include_available_features=1&includ>
May 24 03:41:32 harvester harvester[1648]: Traceback (most recent call last):
May 24 03:41:32 harvester harvester[1648]:   File "/nix/store/mxwfji6aas89jx86ilgplb2pkc68jaxq-python3.6-twitterscraper-nix-0.7.0/lib/python3.6/site-packages/twitterscraper/query.py", line 49, in query_single_page
May 24 03:41:32 harvester harvester[1648]:     json_resp = json.loads(response.text)
May 24 03:41:32 harvester harvester[1648]:   File "/nix/store/ljhgdba6n8ag6f8clpi4m9zizm7b8mx3-python3-3.6.5/lib/python3.6/json/__init__.py", line 354, in loads
May 24 03:41:32 harvester harvester[1648]:     return _default_decoder.decode(s)
May 24 03:41:32 harvester harvester[1648]:   File "/nix/store/ljhgdba6n8ag6f8clpi4m9zizm7b8mx3-python3-3.6.5/lib/python3.6/json/decoder.py", line 339, in decode
May 24 03:41:32 harvester harvester[1648]:     obj, end = self.raw_decode(s, idx=_w(s, 0).end())
May 24 03:41:32 harvester harvester[1648]:   File "/nix/store/ljhgdba6n8ag6f8clpi4m9zizm7b8mx3-python3-3.6.5/lib/python3.6/json/decoder.py", line 357, in raw_decode
May 24 03:41:32 harvester harvester[1648]:     raise JSONDecodeError("Expecting value", s, err.value) from None
May 24 03:41:32 harvester harvester[1648]: json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Worth mentioning that results exist some of the time, and this is a warning, not an error that halts the program.

SpellOnYou commented 6 years ago

@lapp0 Does you mean that Twitter is creating devices that block crawl bots?

taspinar commented 6 years ago

What I found out so far is that some of the time, Twitter responds with a html page (some kind of 404 / error page) to such requests as above (which should only contain a json file).

I have created a new branch where the separate try / except statement for JSONDecodeError is removed. This fix should result in a behavior of retrying the same request with a recursive call to the query_single_page instead of breaking the process. Usually, the second time Twitter does return the correct response.

3ruce commented 6 years ago

I got a similar error and I wonder if this is related to the one above?

I ran this search twitterscraper bread -bd 2018-04-23 -ed 2018-05-23 --output=tweets-01.json which did return some tweets but then running this search afterwards returned zero tweets twitterscraper bread -bd 2018-03-23 -ed 2018-05-23 -output=csv-01.csv

ERROR: Failed to parse JSON "Expecting value: line 1 column 1 (char 0)" while requesting "https://twitter.com/i/search/timeline?f=tweets&vertical=default&include_available_features=1&include_entities=1&reset_error_state=false&src=typd&max_position=TWEET-997990030411956226-997990270372274177&q=bread%20since%3A2018-05-18%20until%3A2018-05-20&l=None"
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/twitterscraper/query.py", line 49, in query_single_page
    json_resp = json.loads(response.text)
  File "/usr/lib/python3.5/json/__init__.py", line 319, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.5/json/decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.5/json/decoder.py", line 357, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
3ruce commented 6 years ago

FYI, I downloaded and installed the jsondecodeerror_bugfix branch. I did get some results but no where near the quantity I would have expected. Running the command below returned no results...

twitterscraper trump -bd 2018-05-23 -ed 2018-05-24 --output=tweets-01.json

taspinar commented 6 years ago

Can you in addition force the user agent to the one specified in issue #90

3ruce commented 6 years ago

Sure, how do I do this? Is it something to add to the command, or do I need to edit a file?

bengarvey commented 6 years ago

Here's what you need to do to make this work for now.

First, if you installed this with pip, uninstall it.

pip uninstall twitterscraper

Check out this repo, checkout this branch

git clone git@github.com:taspinar/twitterscraper.git
cd twitterscraper
git checkout jsondecodeerror_bugfix

Modify twitterscraper/twitterscraper/query.py by changing the HEADER_LIST to

HEADERS_LIST = ['Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.93 Safari/537.36']

Then install it

python setup.py install
jotbasan commented 6 years ago

I had a similar issue - sometimes I wouldn't get any results, sometime I'd get only 20 or 60 messages in addition to that JSON error. Forcing headers (as per @bengarvey 's instructions) fixed the issue.

bengarvey commented 6 years ago

Well, that worked yesterday. Not today :)

3ruce commented 6 years ago

hmmm.... it's still working for me on low volume searches...

usajameskwon commented 6 years ago

sad news... changing header list doesn't work anymore..

Sorry still works very well!!

lapp0 commented 6 years ago

discussion about using the new useragent upstream: https://github.com/hellysmile/fake-useragent/issues/68

branch I'm working off up applying @bengarvey's fix https://github.com/lapp0/twitterscraper/tree/jsondecodeerror_bugfix_new_chrome_headers

lapp0 commented 6 years ago

Edit: ignore below, it's probably not important. I just realized that taspinar's retry functionality is working and this is a non-deterministic failure which usually is fine

Here is the response.text that the twitterscraper is currently attempting to json-encode.

<!DOCTYPE html>
<html dir="ltr" lang="en">
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width,initial-scale=1,maximum-scale=1,user-scalable=0" />
<link rel="preconnect" href="//abs-0.twimg.com" />
<link rel="preconnect" href="//api.twitter.com" />
<link rel="preconnect" href="//o.twimg.com" />
<link rel="preconnect" href="//pbs.twimg.com" />
<link rel="preconnect" href="//t.co" />
<link rel="preconnect" href="//video.twimg.com" />
<link rel="dns-prefetch" href="//abs-0.twimg.com" />
<link rel="dns-prefetch" href="//api.twitter.com" />
<link rel="dns-prefetch" href="//o.twimg.com" />
<link rel="dns-prefetch" href="//pbs.twimg.com" />
<link rel="dns-prefetch" href="//t.co" />
<link rel="dns-prefetch" href="//video.twimg.com" />
<link rel="preload" as="script" crossorigin="anonymous" href="https://abs-0.twimg.com/responsive-web/web/ltr/runtime.2rvester[1437]: <link rel="preload" as="script" crossorigin="anonymous" href="https://abs-0.twimg.com/responsive-web/we:36 harvester harvester[1437]: <link rel="preload" as="script" crossorigin="anonymous" href="https://abs-0.twimg.com/r>
<link rel="preload" as="script" crossorigin="anonymous" href="https://abs-0.twimg.com/responsive-web/web/ltr/main.d53frvester[1437]: <meta property="fb:app_id" content="2231777543" />
<meta property="og:site_name" content="Twitter" />
<meta name="google-site-verification" content="V0yIS0Ec_o3Ii9KThrCoMCkwTYMMJ_JYx_RSaGhFYvw" />
<link rel="manifest" href="/manifest.json" />
<link rel="icon" sizes="192x192" href="https://abs-0.twimg.com/responsive-web/web/ltr/icon-default.882fa4ccf6539401.png" />
<link rel="apple-touch-icon" sizes="192x192" href="https://abs-0.twimg.com/responsive-web/web/ltr/icon-ios.a9cd885bccbcaf2f.png" />
<meta name="mobile-web-app-capable" content="yes" />
<meta name="apple-mobile-web-app-title" content="Twitter Lite" />
<meta name="apple-mobile-web-app-status-bar-style" content="white" />
<meta name="theme-color" content="#ffffff" />
<body>
  <noscript>
    <form action="https://mobile.twitter.com/i/nojs_router?path=%2Fi%2Fsearch%2Ftimeline%3Ff%3Dtweets%26vertical%3Ddefault%26include_available_features%3D1%26include_entities%3D1%26reset_error_state%3Dfalse%26src%3Dtypd%26max_position%3DTWEET-525762382-605336192%26q%3Dfoobar%2520since%253A2007-06-09%2520until%253A2008-01-17%26l%3D" method="POST" style="background-color: #fff; position: fixed; top: 0; left: 0; right: 0; bottom: 0; z-index: 9999;">
      <div style="font-size: 18px; font-family: Helvetica,sans-serif; line-height: 24px; margin: 10%; width: 80%;">
        <p>We've detected that JavaScript is disabled in your browser. Would you like to proceed to legacy Twitter?</p>
        <p style="margin: 20px 0;">
          <button type="submit" style="background-color: #1da1f2; border-radius: 100px; border: none; box-shadow: none; color: #fff; cursor: pointer; font-size: 14px; font-weight: bold; line-height: 20px; padding: 6px 16px;">Yes</button>
        </p>
      </div>
    </form>
  </noscript>
  <div id="react-root" style="height:100%"><div><div aria-label="Loading…" style="background-color:#fff;top:0;left:0;right:0;bottom:0;position:fixed"><svg style="display:inline-block;fill:currentcolor;height:72px;max-width:100%;position:absolute;user-select:none;vertical-align:text-bottom;color:#1da1f2;width:72px;top:0;left:0;right:0;bottom:0;margin:auto" viewBox="0 0 24 24"><g><path d="M23.643 4.937c-.835.37-1.732.62-2.675.733a4.67 4.67 0 0 0 2.048-2.578 9.3 9.3 0 0 1-2.958 1.13 4.66 4.66 0 0 0-7.938 4.25 13.229 13.229 0 0 1-9.602-4.868c-.4.69-.63 1.49-.63 2.342A4.66 4.66 0 0 0 3.96 9.824a4.647 4.647 0 0 1-2.11-.583v.06a4.66 4.66 0 0 0 3.737 4.568 4.692 4.692 0 0 1-2.104.08 4.661 4.661 0 0 0 4.352 3.234 9.348 9.348 0 0 1-5.786 1.995 9.5 9.5 0 0 1-1.112-.065 13.175 13.175 0 0 0 7.14 2.093c8.57 0 13.255-7.098 13.255-13.254 0-.2-.005-.402-.014-.602a9.47 9.47 0 0 0 2.323-2.41z"></path></g></svg></div><div id="failureMessage" style="background-color:#F5F8FA;top:0;left:0;right:0;bottom:0;position:fixed;display:none;z-index:2"><div style="position:absolute;height:200px;top:0;left:0;right:0;bottom:0;margin:auto;text-align:center;line-height:1.3125;font-size:14px;color:#14171A;font-family:-apple-system, BlinkMacSystemFont, &quot;Segoe UI&quot;, Roboto, Ubuntu, &quot;Helvetica Neue&quot;, sans-serif"><svg style="display:block;fill:currentcolor;height:72px;max-width:100%;position:relative;user-select:none;vertical-align:text-bottom;color:#657786;width:72px;margin:0 auto 24px" viewBox="0 0 24 24"><g><circle cx="12.025" cy="16.437" r="1.281"></circle><path d="M14.39 7.194a.495.495 0 0 0-.4-.2h-3.928a.494.494 0 0 0-.4.2.496.496 0 0 0-.08.442l1.814 6.098a.5.5 0 0 0 .48.357h.298a.501.501 0 0 0 .48-.356l1.813-6.098a.495.495 0 0 0-.077-.442z"></path><path d="M12 22.75C6.072 22.75 1.25 17.928 1.25 12S6.072 1.25 12 1.25 22.75 6.072 22.75 12 17.928 22.75 12 22.75zm0-20C6.9 2.75 2.75 6.9 2.75 12S6.9 21.25 12 21.25s9.25-4.15 9.25-9.25S17.1 2.75 12 2.75z"></path></g></svg><p>A problem was encountered trying to load the page.</p><a href="javascript:reloadFromScriptError()" style="background-color:#1DA1F2;border-radius:0.3em;border:0;padding:0.5em 1em;height:1.5em;display:inline-block;margin:1em 0;color:#fff;text-decoration:none;font-weight:bold"><svg style="display:inline-block;fill:currentcolor;height:1.5em;max-width:100%;position:relative;user-select:none;vertical-align:middle" viewBox="0 0 24 24"><g><path d="M12 2C6.486 2 2 6.486 2 12a.75.75 0 0 0 1.5 0c0-4.687 3.813-8.5 8.5-8.5s8.5 3.813 8.5 8.5-3.813 8.5-8.5 8.5c-2.886 0-5.576-1.5-7.13-3.888l2.983.55a.75.75 0 1 0 .274-1.474l-4.663-.86a.746.746 0 0 0-.88.647l-.57 4.706a.749.749 0 1 0 1.488.181l.32-2.63C5.673 20.36 8.728 22 12 22c5.514 0 10-4.486 10-10S17.514 2 12 2z"></path></g></svg> Retry</a></div></div></div></div>
<script>
  window.__INITIAL_STATE__ = {"optimist":[],"toasts":[],"entities":{"users":{"entities":{},"errors":{},"fetchStatus":{}}},"session":{"country":"XX","guestId":"152763591683681529","language":"en","oneFactorLoginEligibility":{"fetchStatus":"none"}},"analytics":{},"featureSwitch":{"config":{"account_country_setting_countries_whitelist":{"value":["ad","ae","af","ag","ai","al","am","ao","ar","as","at","au","aw","ax","az","ba","bb","bd","be","bf","bg","bh","bi","bj","bl","bm","bn","bo","bq","br","bs","bt","bv","bw","by","bz","ca","cc","cd","cf","cg","ch","ci","ck","cl","cm","co","cr","cu","cv","cw","cx","cy","cz","de","dj","dk","dm","do","dz","ec","ee","eg","er","es","et","fi","fj","fk","fm","fo","fr","ga","gb","gd","ge","gf","gg","gh","gi","gl","gm","gn","gp","gq","gr","gs","gt","gu","gw","gy","hk","hn","hr","ht","hu","id","ie","il","im","in","io","iq","ir","is","it","je","jm","jo","jp","ke","kg","kh","ki","km","kn","kr","kw","ky","kz","la","lb","lc","li","lk","lr","ls","lt","lu","lv","ly","ma","mc","md","me","mf","mg","mh","mk","ml","mn","mo","mp","mq","mr","ms","mt","mu","mv","mw","mx","my","mz","na","nc","ne","nf","ng","ni","nl","no","np","nr","nu","nz","om","pa","pe","pf","pg","ph","pk","pl","pm","pn","pr","ps","pt","pw","py","qa","re","ro","rs","ru","rw","sa","sb","sc","se","sg","sh","si","sk","sl","sm","sn","so","sr","st","sv","sx","sz","tc","td","tf","tg","th","tj","tk","tl","tm","tn","to","tr","tt","tv","tw","tz","ua","ug","us","uy","uz","va","vc","ve","vi","vn","vu","wf","ws","xk","ye","yt","za","zm","zw"]},"live_event_hero_description_fields_enabled":{"value":true},"live_event_hero_ugm_attribution_enabled":{"value":false},"live_event_timeline_default_refresh_rate_interval_seconds":{"value":30},"live_event_timeline_minimum_refresh_rate_interval_seconds":{"value":10},"live_event_timeline_server_controlled_refresh_rate_enabled":{"value":true},"moment_annotations_enabled":{"value":false},"responsive_web_allow_switch_to_ms":{"value":false},"responsive_web_birthdays_enabled":{"value":false},"responsive_web_broadcasts_page_enabled":{"value":false},"responsive_web_composer_v2_enabled":{"value":false},"responsive_web_composer_v2_modal_compose_enabled":{"value":false},"responsive_web_desktop_bookmarks_enabled":{"value":false},"responsive_web_dm_livepipeline_enabled":{"value":false},"responsive_web_dm_reporting_enabled":{"value":false},"responsive_web_dm_typing_indicator_enabled":{"value":false},"responsive_web_eu_countries":{"value":["at","be","bg","ch","cy","cz","de","dk","ee","es","fi","fr","gb","gr","hr","hu","ie","is","it","li","lt","lu","lv","mt","nl","no","pl","pt","ro","se","si","sk"]},"responsive_web_event_card_enabled":{"value":false},"responsive_web_explore_feedback_actions_enabled":{"value":false},"responsive_web_feedback_link":{"value":""},"responsive_web_fetch_hashflags_on_boot":{"value":true},"responsive_web_gdpr_age_gate":{"value":true},"responsive_web_gdpr_twitter_archive":{"value":false},"responsive_web_gdpr_periscope_archive":{"value":false},"responsive_web_gdpr_logged_out_banner":{"value":true},"responsive_web_gdpr_change_country_ocf_flow":{"value":true},"responsive_web_graphql_verify_credentials_enabled":{"value":false},"responsive_web_graphql_verify_credentials_server_enabled":{"value":false},"responsive_web_htl_compose_prompt":{"value":false},"responsive_web_inline_video_player_enabled":{"value":false},"responsive_web_microsoft_jump_links":{"value":false},"responsive_web_ntab_verified_mentions_vit_internal_dogfood":{"value":false},"responsive_web_ocf_enabled":{"value":true},"responsive_web_report_page_not_found":{"value":false},"responsive_web_reported_tweet_tombstones_enabled":{"value":false},"responsive_web_search_filters_enabled":{"value":true},"responsive_web_settings_contacts_dashboard_enabled":{"value":false},"responsive_web_settings_email_notifications_enabled":{"value":true},"responsive_web_settings_facebook_connect_enabled":{"value":false},"responsive_web_settings_login_verification_enabled":{"value":false},"responsive_web_settings_notif_v2_push":{"value":true},"responsive_web_settings_notif_v2_sms":{"value":true},"responsive_web_settings_nsfw_user_enabled":{"value":true},"responsive_web_settings_password_applications_info_enabled":{"value":false},"responsive_web_settings_sessions_dashboard_enabled":{"value":false},"responsive_web_settings_u2f_security_key_enabled":{"value":false},"responsive_web_settings_trends_enabled":{"value":true},"responsive_web_transform_virtual_scroller":{"value":true},"responsive_web_tweet_source_timeline_enabled":{"value":false},"responsive_web_tweet_source_tweet_detail_enabled":{"value":false},"responsive_web_unified_cards":{"value":"no"},"responsive_web_urt_list_tweets_enabled":{"value":true},"responsive_web_urt_show_cover_enabled":{"value":false},"responsive_web_verification_v2_enabled":{"value":false},"responsive_web_windows_oauth_login":{"value":"auto_login"},"scribe_api_error_sample_size":{"value":0},"scribe_api_sample_size":{"value":100},"scribe_cdn_host_list":{"value":["si0.twimg.com","si1.twimg.com","si2.twimg.com","si3.twimg.com","a0.twimg.com","a1.twimg.com","a2.twimg.com","a3.twimg.com","abs.twimg.com","amp.twimg.com","o.twimg.com","pbs.twimg.com","pbs-eb.twimg.com","pbs-ec.twimg.com","pbs-v6.twimg.com","pbs-h1.twimg.com","pbs-h2.twimg.com","video.twimg.com","platform.twitter.com","cdn.api.twitter.com","ton.twimg.com","v.cdn.vine.co","mtc.cdn.vine.co","edge.vncdn.co","mid.vncdn.co"]},"scribe_cdn_sample_size":{"value":50},"scribe_enabled":{"value":true},"traffic_redirect_5347_hostmap":{"value":[]},"user_display_name_max_limit":{"value":50},"responsive_web_logged_out_homepage_6952":{"value":"treatment"},"responsive_web_night_mode_7836":{"value":"enabled"},"responsive_web_smart_lock_7159":{"value":"all"}},"impressions":{},"featureSetToken":"9dee8851ebe378f146b9ea25f610a73259e043ba","isLoaded":true,"isLoading":false,"keysRead":{},"settingsVersion":"ac9b6bc39ab73f6a95ad5c2c5765b7d5"},"typeaheadUsers":{"fetchStatus":"none","users":{},"blacklist":{},"lastUpdated":0,"index":{}},"blockedUsers":{"userIds":[],"fetchStatus":"none"},"settings":{"local":{"nextPushCheckin":0,"shouldAutoPlayGif":false},"remote":{"settings":{"display_sensitive_media":false},"fetchStatus":"none"},"dataSaver":{},"transient":{"dtabBarInfo":{"dtabAll":null,"dtabRweb":null,"hide":false},"loginPromptShown":false}},"devices":{"browserPush":{"fetchStatus":"none","pushNotificationsPrompt":{"count":0,"dismissed":false,"fetchStatus":"none"},"subscribed":false,"supported":null},"devices":{"data":{"emails":[],"phone_numbers":[]},"fetchStatus":"none"},"notificationSettings":{"_pushSettings":{},"_pushSettingsTemplate":{},"_smsSettings":{"error":null,"fetchStatus":"none"},"_smsSettingsTemplate":{}}}};
  window.__META_DATA__ = {"env":"prod","isLoggedIn":false,"isRTL":false};
</script>
<script>
  document.cookie = decodeURIComponent("gt=1001603747062124544; Max-Age=10800; Domain=.twitter.com; Path=/");
</script>
<script>
  window.webpackChunkManifest = {"bundle.AccessInterstitial":"bundle.AccessInterstitial.0a886e55d07fd9d1.js","loader.DashMenu":"loader.DashMenu.f41af6a8aed64560.js","loader.SearchBox":"loader.SearchBox.f33c1d8a9d2f0779.js","loader.WideLayout":"loader.WideLayout.0e65294f86f6c39b.js","loader.PushNotificationsPrompt":"loader.PushNotificationsPrompt.c31086d53bc3ae52.js","bundle.Account":"bundle.Account.0d737ba8cea6cc52.js","bundle.Bookmarks":"bundle.Bookmarks.9ddaf670ca48ccc1.js","loader.AppModules":"loader.AppModules.54a034314566236b.js","loader.BroadcastCard":"loader.BroadcastCard.812462a4ae302154.js","loader.BroadcastPlayer":"loader.BroadcastPlayer.ed281794d61fd3ef.js","loader.VideoPlayer":"loader.VideoPlayer.8ef5e67114d25f39.js","hls.js":"hls.js.bf79fb6a401c797b.js","loader.EntryTombstone":"loader.EntryTombstone.0e0a5df360bd3f79.js","loader.FeedbackSheet":"loader.FeedbackSheet.2e826a17f3a963e0.js","loader.SignupModule":"loader.SignupModule.b458c376fac2e64c.js","loader.TimelineGap":"loader.TimelineGap.d8413106cdcee4e3.js","loader.Trends":"loader.Trends.3b820fa3ca99729b.js","loader.TweetCurationActionSheet":"loader.TweetCurationActionSheet.a8ce51a07d0ef679.js","loader.TweetPhotos":"loader.TweetPhotos.0a542d35ee03636c.js","loader.UnifiedCard":"loader.UnifiedCard.faf7b0d5d441aa83.js","ondemand.InlinePlayer":"ondemand.InlinePlayer.8b36fc73ffde364e.js","ondemand.AccessInterstitial":"ondemand.AccessInterstitial.bbe69b7e2f0224f8.js","bundle.Broadcast":"bundle.Broadcast.eeee777949be1b78.js","bundle.Collection":"bundle.Collection.177ed6760829194f.js","bundle.Compose":"bundle.Compose.3513c1ab3c6aca9f.js","ondemand.MicrosoftInterface":"ondemand.MicrosoftInterface.b114bf15c8508df4.js","bundle.ComposeV2":"bundle.ComposeV2.1e157c35b6434dcb.js","bundle.Conversation":"bundle.Conversation.ac83b1b8865b8ef1.js","bundle.ConversationParticipants":"bundle.ConversationParticipants.1dd98c6532c8d394.js","bundle.CredentialsPicker":"bundle.CredentialsPicker.b1e7a33cdc26f1ac.js","bundle.DirectMessages":"bundle.DirectMessages.f5c056761af48bd2.js","bundle.Download":"bundle.Download.f76c3abf3976b2d6.js","bundle.Explore":"bundle.Explore.357bc5750c112cdb.js","bundle.FollowerRequests":"bundle.FollowerRequests.5639c30d710f4035.js","bundle.FoundMedia":"bundle.FoundMedia.a8dc044ba72a1267.js","bundle.GenericTimeline":"bundle.GenericTimeline.865e036862a39389.js","bundle.Highlights":"bundle.Highlights.dcbcc3a9740565dc.js","bundle.HomeTimeline":"bundle.HomeTimeline.bb1cf156aecd8221.js","bundle.LiveEvent":"bundle.LiveEvent.44b0f8690a218e4b.js","bundle.LoggedOutHome":"bundle.LoggedOutHome.7cebe2d639e6baac.js","bundle.LoggedOutHomeV2":"bundle.LoggedOutHomeV2.aba2615d7d501496.js","bundle.Login":"bundle.Login.c040f59c0b88e5ca.js","bundle.Moment":"bundle.Moment.b5633aa8e8e2dd6c.js","bundle.NetworkInstrument":"bundle.NetworkInstrument.a64aa24912a1ea7d.js","bundle.NotificationDetail":"bundle.NotificationDetail.8626735cc1c716e4.js","bundle.Notifications":"bundle.Notifications.572bd2c0db1af32b.js","bundle.Ocf":"bundle.Ocf.43c617118e85cc40.js","bundle.Report":"bundle.Report.83a2bfca7c7f8d61.js","bundle.RichTextCompose":"bundle.RichTextCompose.33b589c3fbe06d23.js","bundle.Search":"bundle.Search.78ed436e150afbce.js","bundle.Settings":"bundle.Settings.5d9a181a3b3d62fa.js","bundle.SettingsInternals":"bundle.SettingsInternals.d70c97d527dca3b0.js","ondemand.countries-ar":"ondemand.countries-ar.2fb9a1ec172f64d5.js","ondemand.countries-bg":"ondemand.countries-bg.e35c4de062b13441.js","ondemand.countries-bn":"ondemand.countries-bn.fdc22fd7449c8e5e.js","ondemand.countries-ca":"ondemand.countries-ca.0c5f5fd3a6a19cd2.js","ondemand.countries-cs":"ondemand.countries-cs.1f9e5d57c71ac33b.js","ondemand.countries-da":"ondemand.countries-da.ee1520cf42140f35.js","ondemand.countries-de":"ondemand.countries-de.1c92da67cba0f7ef.js","ondemand.countries-el":"ondemand.countries-el.f6fed22f9c4b446a.js","ondemand.countries-en":"ondemand.countries-en.4340fa379bfcd96c.js","ondemand.countries-en-GB":"ondemand.countries-en-GB.0ddb82ed14613896.js","ondemand.countries-es":"ondemand.countries-es.a81afc755f1b2715.js","ondemand.countries-eu":"ondemand.countries-eu.52e072afc9fd8d81.js","ondemand.countries-fa":"ondemand.countries-fa.d39b25a43b80d4e1.js","ondemand.countries-fi":"ondemand.countries-fi.8619573e4c56f488.js","ondemand.countries-fil":"ondemand.countries-fil.86108f2daf1ac62f.js","ondemand.countries-fr":"ondemand.countries-fr.3ad8cb5c65a08975.js","ondemand.countries-ga":"ondemand.countries-ga.21cabd7b5f5fd189.js","ondemand.countries-gl":"ondemand.countries-gl.4e5f386bddc27df1.js","ondemand.countries-gu":"ondemand.countries-gu.a0ea66135f7a5b0e.js","ondemand.countries-he":"ondemand.countries-he.550a1106d3083b7d.js","ondemand.countries-hi":"ondemand.countries-hi.b65f7e584a50308e.js","ondemand.countries-hr":"ondemand.countries-hr.95b04f3a9334fe47.js","ondemand.countries-hu":"ondemand.countries-hu.80376d4679383ea3.js","ondemand.countries-id":"ondemand.countries-id.8e2c5175cd66aa65.js","ondemand.countries-it":"ondemand.countries-it.e9a06233169f19b2.js","ondemand.countries-ja":"ondemand.countries-ja.f766fe1300c540a4.js","ondemand.countries-kn":"ondemand.countries-kn.c2e2138da8f0261e.js","ondemand.countries-ko":"ondemand.countries-ko.606f5d220279b29c.js","ondemand.countries-mr":"ondemand.countries-mr.2ac9b3be671767b0.js","ondemand.countries-ms":"ondemand.countries-ms.75b244ff50f9cb0a.js","ondemand.countries-nb":"ondemand.countries-nb.968cba5a0112717e.js","ondemand.countries-nl":"ondemand.countries-nl.8928a1c2a78f816d.js","ondemand.countries-pl":"ondemand.countries-pl.b93f0a3117b687a2.js","ondemand.countries-pt":"ondemand.countries-pt.1033685135644792.js","ondemand.countries-ro":"ondemand.countries-ro.5f9a77a9943738d1.js","ondemand.countries-ru":"ondemand.countries-ru.c523c152294ff17a.js","ondemand.countries-sk":"ondemand.countries-sk.e5ba5e5a30d63a26.js","ondemand.countries-sr":"ondemand.countries-sr.8436803b7ca45970.js","ondemand.countries-sv":"ondemand.countries-sv.d3d68607f006f364.js","ondemand.countries-ta":"ondemand.countries-ta.9924444018bd6fd6.js","ondemand.countries-th":"ondemand.countries-th.97d27d91f1e8b1f9.js","ondemand.countries-tr":"ondemand.countries-tr.5e21112431856457.js","ondemand.countries-uk":"ondemand.countries-uk.fae009d4afa54dbe.js","ondemand.countries-zh":"ondemand.countries-zh.6ef733aab4c67996.js","ondemand.countries-zh-Hant":"ondemand.countries-zh-Hant.fea4ecfac68c55b0.js","bundle.SettingsProfile":"bundle.SettingsProfile.ab283edb8316db9e.js","bundle.SettingsTransparency":"bundle.SettingsTransparency.bfc5a5b693efdecb.js","bundle.SmsLogin":"bundle.SmsLogin.83386fa187f3751f.js","bundle.Stickers":"bundle.Stickers.44fa69df5bb970b2.js","bundle.Topics":"bundle.Topics.f031b702b5662037.js","bundle.Trends":"bundle.Trends.7d8fb7840bb1695d.js","bundle.TweetActivity":"bundle.TweetActivity.74d0c23f81ca5985.js","bundle.TweetMediaDetail":"bundle.TweetMediaDetail.8a914737d032edde.js","bundle.TweetMediaTags":"bundle.TweetMediaTags.7ac2030e881605c8.js","bundle.Twitterversary":"bundle.Twitterversary.cd6142decdce8223.js","bundle.UserAvatar":"bundle.UserAvatar.4ff8f77f9835ddbf.js","bundle.UserFollowLists":"bundle.UserFollowLists.761911bc2abed421.js","bundle.UserLists":"bundle.UserLists.7d92464eff372ac0.js","bundle.UserMoments":"bundle.UserMoments.2dd22a822fb08c4e.js","bundle.UserOnboarding":"bundle.UserOnboarding.dae2f08bc6ce1e63.js","bundle.UserProfile":"bundle.UserProfile.a7a78f92a1d13554.js","bundle.UserProfileTimelines":"bundle.UserProfileTimelines.44603ddd0d35316a.js","ondemand.ProfileSidebar":"ondemand.ProfileSidebar.59898236d7f06fe0.js","bundle.UserProfileSuspended":"bundle.UserProfileSuspended.d311eb8da64b059a.js","bundle.UserRedirect":"bundle.UserRedirect.870c393f79eed48a.js","bundle.shared.Compose":"bundle.shared.Compose.4cca3cba8d45cd5a.js","ondemand.SettingsInternals":"ondemand.SettingsInternals.7615b372ac35bdbb.js","shared":"shared.18a216f7e10b8cd4.js"};
</script>
<script>
 window.showFailureMessage = function (source) { window.Raven && window.Raven.captureMessage( 'Failed to load source', { level: 'error', extra: { source: source } } ); document.getElementById('failureMessage').style.display = 'block'; }; window.reloadFromScriptError = function () { window.location.reload(); }; </script>
<script crossorigin="anonymous" onerror="showFailureMessage('https://abs-0.twimg.com/responsive-web/web/ltr/runtime.270230779ff8aae3.js');" src="https://abs-0.twimg.com/responsive-web/web/ltr/runtime.270230779ff8aae3.js"></script>
<script crossorigin="anonymous" onerror="showFailureMessage('https://abs-0.twimg.com/responsive-web/web/ltr/vendor.c66168a4671f3557.js');" src="https://abs-0.twimg.com/responsive-web/web/ltr/vendor.c66168a4671f3557.js"></script>
<script crossorigin="anonymous" onerror="showFailureMessage('https://abs-0.twimg.com/responsive-web/web/ltr/i18n/en.eeb53accf85c34c7.js');" src="https://abs-0.twimg.com/responsive-web/web/ltr/i18n/en.eeb53accf85c34c7.js"></script>
<script crossorigin="anonymous" onerror="showFailureMessage('https://abs-0.twimg.com/responsive-web/web/ltr/main.d53f6bdacc4ceca3.js');" src="https://abs-0.twimg.com/responsive-web/web/ltr/main.d53f6bdacc4ceca3.js"></script>
taspinar commented 6 years ago

Does forcing the US to be Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36 also help?

bengarvey commented 6 years ago

@taspinar setting that as my HEADER_LIST seems to have worked.

bengarvey commented 6 years ago

Worked all day yesterday, but stopped working again this morning.

lapp0 commented 6 years ago

@bengarvey still working for me. What's your query? What's your error?

bengarvey commented 6 years ago

No results returned for even the simplest query using the HEADER_LIST you posted

 twitterscraper Trump --limit 100
lapp0 commented 6 years ago

Seems you've had it stop working for you, then start working again multiple times.

Is there some common factor in those times you've had it not work for you? Can you try running from the cloud?

I'm getting results for your query btw.

bengarvey commented 6 years ago

Not sue. It worked for a few days when I started using Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36 for the HEADER_LIST, but stopped suddenly this morning.

bengarvey commented 6 years ago

I made a tiny change to the header and it's working again. I think my header/IP was blocked, maybe?

lapp0 commented 6 years ago

Interesting. Could you post the response.text when you run it with the bad header? Perhaps there's some header-changing logic which can be applied based on detection of this specific failure by looking at response.text to fix this.

taspinar commented 6 years ago

I believe that most of the 'not being able to get all tweets' issues are caused by the useragent provided by fake_useragent. I have removed fake_useragent as an dependency all together. See PR https://github.com/taspinar/twitterscraper/pull/119

Once this PR is merged, the newest versions should no longer have these issues.

Can you guys have a look at the PR?

LesmesWeb commented 6 years ago

good day, I have the following problem. Now update to version 0.7.0.1 and execute the following query:

twitterscraper Trump -l 100 -bd 2017-01-01 -ed 2017-06-01 -o tweets2.json

It throws me the following errors:

Traceback (most recent call last): File "/webapp/1-ProyectoDjango/ScraperEntorno/Entorno/bin/twitterscraper", line 7, in from twitterscraper.main import main File "/webapp/1-ProyectoDjango/ScraperEntorno/Entorno/local/lib/python2.7/site-packages/twitterscraper/init.py", line 13, in from twitterscraper.query import query_tweets File "/webapp/1-ProyectoDjango/ScraperEntorno/Entorno/local/lib/python2.7/site-packages/twitterscraper/query.py", line 10, in from twitterscraper.logging import logger File "/webapp/1-ProyectoDjango/ScraperEntorno/Entorno/local/lib/python2.7/site-packages/twitterscraper/logging.py", line 4, in logger = logging.getLogger('twitterscraper') AttributeError: 'module' object has no attribute 'getLogger'

Yesterday I was bringing Tweets and today not, so update to the new version.

lapp0 commented 6 years ago

@CamiloVeloz this is due to my pull request here https://github.com/taspinar/twitterscraper/pull/117. I didn't realize it was incompatible with python2. You can fix it by installing python3, or reverting the update.

@taspinar is this project intended to retain python2 compatability? If so, I could fix it so it works with python2.

3ruce commented 6 years ago

@CamiloVeloz I fixed the python 3 problem like this...

First I uninstalled the old version

sudo pip uninstall twitterscraper

Then I got the new one going like so...

cd /home/me/myinstalldirectory
git clone https://github.com/taspinar/twitterscraper.git
cd twitterscraper
sudo python3 setup.py install

Having run my test search twitterscraper trump -bd 2018-05-23 -ed 2018-05-24 --output=tweets-01.jsonwhich before this fix only returned 675 tweets, I got 1481 this time but the process died with this error

INFO: Querying trump since:2018-05-23 until:2018-05-24
ERROR: Failed to parse JSON "Expecting value: line 1 column 1 (char 0)" while requesting "https://twitter.com/i/search/timeline?f=tweets&vertical=default&include_available_features=1&include_entities=1&reset_error_state=false&src=typd&max_position=TWEET-999439049721970688-999439834686087168&q=trump%20since%3A2018-05-23%20until%3A2018-05-24&l=None"
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/twitterscraper-0.7.1-py3.5.egg/twitterscraper/query.py", line 53, in query_single_page
    json_resp = json.loads(response.text)
  File "/usr/lib/python3.5/json/__init__.py", line 319, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.5/json/decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.5/json/decoder.py", line 357, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
INFO: Got 1481 tweets for trump%20since%3A2018-05-23%20until%3A2018-05-24.
lapp0 commented 6 years ago

Hmm, you should have gotten a log line saying Retrying... (Attempts left attemptsleffhere) as per https://github.com/taspinar/twitterscraper/blob/master/twitterscraper/query.py#L81

It should retry 10 times by default. I'm assuming you didn't omit any log lines.

Can you check whether you are on latest master?

3ruce commented 6 years ago

I may not have upgraded successfully on my test machine but on my live machines, it's working much much better...

taspinar commented 6 years ago

@lapp0 , If it is not too much trouble...

lapp0 commented 6 years ago

@taspinar done https://github.com/taspinar/twitterscraper/pull/123

lapp0 commented 6 years ago

@taspinar this should be closed

PulpyJuice commented 5 years ago

Hi @taspinar,

I don't know if you have received any messages regarding this error lately, but I have been having close to the same issues as stated above. I was going to try and manually input the user agent, but as I can understand the variable has been changed in a later version.

I have created a pastebin with the results by running a command from the documentation. I have removed duplicates as the original paste succeeded the free trial limits of PasteBin.

Link https://pastebin.com/raw/bkyAgGsK

Best regards

PulpyJuice commented 5 years ago

Update: I've tried changing my IP, knowing that Twitter has some odd ways of blocking. Seemingly it changed the amount of tweets I have been receiving, however, after a short while I still end up getting 0.

INFO: Got 120 tweets (120 new). INFO: Got 240 tweets (120 new). INFO: Got 360 tweets (120 new). INFO: Got 480 tweets (120 new). INFO: Got 600 tweets (120 new). INFO: Got 720 tweets (120 new). INFO: Got 720 tweets (0 new). INFO: Got 720 tweets (0 new). INFO: Got 720 tweets (0 new). INFO: Got 720 tweets (0 new). INFO: Got 720 tweets (0 new). INFO: Got 720 tweets (0 new). INFO: Got 720 tweets (0 new). INFO: Got 720 tweets (0 new). INFO: Got 720 tweets (0 new). INFO: Got 720 tweets (0 new). INFO: Got 720 tweets (0 new).

Update: Tested on two devices running 1809 WIN 10. No difference. Also tried running with limits/no limits, adding additional poolsize/running without set pool size. My main issue with this is not so much that I do not get a large enough data pool, but that the datapool is scattered extremely between dates. For instance, given a dataset of 100.000 tweets over 100 days I will have 3 days of 33.000 tweets and 97 days of nothing.

lubhaniagarwal commented 4 years ago

@bengarvey still working for me. What's your query? What's your error?

hello, I am facing some issue as I'm getting 0 tweets. I cloned this https://github.com/lapp0/twitterscraper.git and run python setup.py install but it shows 0 tweets. Please HELP!. Thanks

@bengarvey still working for me. What's your query? What's your error?

lapp0 commented 4 years ago

@lubhaniagarwal I think you should use taspinars branch, mine has last been updated in 2018, and the changes were already merged into taspinars.

lubhaniagarwal commented 4 years ago

Hey, Can you please help me with steps like how to proceed . It will really be helpful for me. Thank you in advance.

On Thu, Jun 4, 2020, 23:42 lapp0 notifications@github.com wrote:

@lubhaniagarwal https://github.com/lubhaniagarwal I think you should use taspinars branch, mine has last been updated in 2018.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/taspinar/twitterscraper/issues/115#issuecomment-639019339, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKUJE5CX24VWYTTFTXET3W3RU7PY5ANCNFSM4FBQ75DQ .

lapp0 commented 4 years ago

@lubhaniagarwal git clone https://github.com/taspinar/twitterscraper.git