Closed usajameskwon closed 5 years ago
I'm getting this problem and I was not getting this problem previously as well. Currently looking into it, twitter probably introduced changes that breaks this. I'm aware of other software which twitter's recent changes broke as well.
Full traceback:
May 24 03:41:32 harvester harvester[1648]: ERROR:root:Failed to parse JSON "Expecting value: line 1 column 1 (char 0)" while requesting "https://twitter.com/i/search/timeline?f=tweets&vertical=default&include_available_features=1&includ>
May 24 03:41:32 harvester harvester[1648]: Traceback (most recent call last):
May 24 03:41:32 harvester harvester[1648]: File "/nix/store/mxwfji6aas89jx86ilgplb2pkc68jaxq-python3.6-twitterscraper-nix-0.7.0/lib/python3.6/site-packages/twitterscraper/query.py", line 49, in query_single_page
May 24 03:41:32 harvester harvester[1648]: json_resp = json.loads(response.text)
May 24 03:41:32 harvester harvester[1648]: File "/nix/store/ljhgdba6n8ag6f8clpi4m9zizm7b8mx3-python3-3.6.5/lib/python3.6/json/__init__.py", line 354, in loads
May 24 03:41:32 harvester harvester[1648]: return _default_decoder.decode(s)
May 24 03:41:32 harvester harvester[1648]: File "/nix/store/ljhgdba6n8ag6f8clpi4m9zizm7b8mx3-python3-3.6.5/lib/python3.6/json/decoder.py", line 339, in decode
May 24 03:41:32 harvester harvester[1648]: obj, end = self.raw_decode(s, idx=_w(s, 0).end())
May 24 03:41:32 harvester harvester[1648]: File "/nix/store/ljhgdba6n8ag6f8clpi4m9zizm7b8mx3-python3-3.6.5/lib/python3.6/json/decoder.py", line 357, in raw_decode
May 24 03:41:32 harvester harvester[1648]: raise JSONDecodeError("Expecting value", s, err.value) from None
May 24 03:41:32 harvester harvester[1648]: json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Worth mentioning that results exist some of the time, and this is a warning, not an error that halts the program.
@lapp0 Does you mean that Twitter is creating devices that block crawl bots?
What I found out so far is that some of the time, Twitter responds with a html page (some kind of 404 / error page) to such requests as above (which should only contain a json file).
I have created a new branch where the separate try / except statement for JSONDecodeError is removed. This fix should result in a behavior of retrying the same request with a recursive call to the query_single_page
instead of breaking the process. Usually, the second time Twitter does return the correct response.
I got a similar error and I wonder if this is related to the one above?
I ran this search twitterscraper bread -bd 2018-04-23 -ed 2018-05-23 --output=tweets-01.json
which did return some tweets but then running this search afterwards returned zero tweets twitterscraper bread -bd 2018-03-23 -ed 2018-05-23 -output=csv-01.csv
ERROR: Failed to parse JSON "Expecting value: line 1 column 1 (char 0)" while requesting "https://twitter.com/i/search/timeline?f=tweets&vertical=default&include_available_features=1&include_entities=1&reset_error_state=false&src=typd&max_position=TWEET-997990030411956226-997990270372274177&q=bread%20since%3A2018-05-18%20until%3A2018-05-20&l=None"
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/twitterscraper/query.py", line 49, in query_single_page
json_resp = json.loads(response.text)
File "/usr/lib/python3.5/json/__init__.py", line 319, in loads
return _default_decoder.decode(s)
File "/usr/lib/python3.5/json/decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python3.5/json/decoder.py", line 357, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
FYI, I downloaded and installed the jsondecodeerror_bugfix branch. I did get some results but no where near the quantity I would have expected. Running the command below returned no results...
twitterscraper trump -bd 2018-05-23 -ed 2018-05-24 --output=tweets-01.json
Can you in addition force the user agent to the one specified in issue #90
Sure, how do I do this? Is it something to add to the command, or do I need to edit a file?
Here's what you need to do to make this work for now.
First, if you installed this with pip, uninstall it.
pip uninstall twitterscraper
Check out this repo, checkout this branch
git clone git@github.com:taspinar/twitterscraper.git
cd twitterscraper
git checkout jsondecodeerror_bugfix
Modify twitterscraper/twitterscraper/query.py by changing the HEADER_LIST to
HEADERS_LIST = ['Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.93 Safari/537.36']
Then install it
python setup.py install
I had a similar issue - sometimes I wouldn't get any results, sometime I'd get only 20 or 60 messages in addition to that JSON error. Forcing headers (as per @bengarvey 's instructions) fixed the issue.
Well, that worked yesterday. Not today :)
hmmm.... it's still working for me on low volume searches...
sad news... changing header list doesn't work anymore..
Sorry still works very well!!
discussion about using the new useragent upstream: https://github.com/hellysmile/fake-useragent/issues/68
branch I'm working off up applying @bengarvey's fix https://github.com/lapp0/twitterscraper/tree/jsondecodeerror_bugfix_new_chrome_headers
Edit: ignore below, it's probably not important. I just realized that taspinar's retry functionality is working and this is a non-deterministic failure which usually is fine
Here is the response.text
that the twitterscraper
is currently attempting to json-encode.
<!DOCTYPE html>
<html dir="ltr" lang="en">
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width,initial-scale=1,maximum-scale=1,user-scalable=0" />
<link rel="preconnect" href="//abs-0.twimg.com" />
<link rel="preconnect" href="//api.twitter.com" />
<link rel="preconnect" href="//o.twimg.com" />
<link rel="preconnect" href="//pbs.twimg.com" />
<link rel="preconnect" href="//t.co" />
<link rel="preconnect" href="//video.twimg.com" />
<link rel="dns-prefetch" href="//abs-0.twimg.com" />
<link rel="dns-prefetch" href="//api.twitter.com" />
<link rel="dns-prefetch" href="//o.twimg.com" />
<link rel="dns-prefetch" href="//pbs.twimg.com" />
<link rel="dns-prefetch" href="//t.co" />
<link rel="dns-prefetch" href="//video.twimg.com" />
<link rel="preload" as="script" crossorigin="anonymous" href="https://abs-0.twimg.com/responsive-web/web/ltr/runtime.2rvester[1437]: <link rel="preload" as="script" crossorigin="anonymous" href="https://abs-0.twimg.com/responsive-web/we:36 harvester harvester[1437]: <link rel="preload" as="script" crossorigin="anonymous" href="https://abs-0.twimg.com/r>
<link rel="preload" as="script" crossorigin="anonymous" href="https://abs-0.twimg.com/responsive-web/web/ltr/main.d53frvester[1437]: <meta property="fb:app_id" content="2231777543" />
<meta property="og:site_name" content="Twitter" />
<meta name="google-site-verification" content="V0yIS0Ec_o3Ii9KThrCoMCkwTYMMJ_JYx_RSaGhFYvw" />
<link rel="manifest" href="/manifest.json" />
<link rel="icon" sizes="192x192" href="https://abs-0.twimg.com/responsive-web/web/ltr/icon-default.882fa4ccf6539401.png" />
<link rel="apple-touch-icon" sizes="192x192" href="https://abs-0.twimg.com/responsive-web/web/ltr/icon-ios.a9cd885bccbcaf2f.png" />
<meta name="mobile-web-app-capable" content="yes" />
<meta name="apple-mobile-web-app-title" content="Twitter Lite" />
<meta name="apple-mobile-web-app-status-bar-style" content="white" />
<meta name="theme-color" content="#ffffff" />
<body>
<noscript>
<form action="https://mobile.twitter.com/i/nojs_router?path=%2Fi%2Fsearch%2Ftimeline%3Ff%3Dtweets%26vertical%3Ddefault%26include_available_features%3D1%26include_entities%3D1%26reset_error_state%3Dfalse%26src%3Dtypd%26max_position%3DTWEET-525762382-605336192%26q%3Dfoobar%2520since%253A2007-06-09%2520until%253A2008-01-17%26l%3D" method="POST" style="background-color: #fff; position: fixed; top: 0; left: 0; right: 0; bottom: 0; z-index: 9999;">
<div style="font-size: 18px; font-family: Helvetica,sans-serif; line-height: 24px; margin: 10%; width: 80%;">
<p>We've detected that JavaScript is disabled in your browser. Would you like to proceed to legacy Twitter?</p>
<p style="margin: 20px 0;">
<button type="submit" style="background-color: #1da1f2; border-radius: 100px; border: none; box-shadow: none; color: #fff; cursor: pointer; font-size: 14px; font-weight: bold; line-height: 20px; padding: 6px 16px;">Yes</button>
</p>
</div>
</form>
</noscript>
<div id="react-root" style="height:100%"><div><div aria-label="Loading…" style="background-color:#fff;top:0;left:0;right:0;bottom:0;position:fixed"><svg style="display:inline-block;fill:currentcolor;height:72px;max-width:100%;position:absolute;user-select:none;vertical-align:text-bottom;color:#1da1f2;width:72px;top:0;left:0;right:0;bottom:0;margin:auto" viewBox="0 0 24 24"><g><path d="M23.643 4.937c-.835.37-1.732.62-2.675.733a4.67 4.67 0 0 0 2.048-2.578 9.3 9.3 0 0 1-2.958 1.13 4.66 4.66 0 0 0-7.938 4.25 13.229 13.229 0 0 1-9.602-4.868c-.4.69-.63 1.49-.63 2.342A4.66 4.66 0 0 0 3.96 9.824a4.647 4.647 0 0 1-2.11-.583v.06a4.66 4.66 0 0 0 3.737 4.568 4.692 4.692 0 0 1-2.104.08 4.661 4.661 0 0 0 4.352 3.234 9.348 9.348 0 0 1-5.786 1.995 9.5 9.5 0 0 1-1.112-.065 13.175 13.175 0 0 0 7.14 2.093c8.57 0 13.255-7.098 13.255-13.254 0-.2-.005-.402-.014-.602a9.47 9.47 0 0 0 2.323-2.41z"></path></g></svg></div><div id="failureMessage" style="background-color:#F5F8FA;top:0;left:0;right:0;bottom:0;position:fixed;display:none;z-index:2"><div style="position:absolute;height:200px;top:0;left:0;right:0;bottom:0;margin:auto;text-align:center;line-height:1.3125;font-size:14px;color:#14171A;font-family:-apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, Ubuntu, "Helvetica Neue", sans-serif"><svg style="display:block;fill:currentcolor;height:72px;max-width:100%;position:relative;user-select:none;vertical-align:text-bottom;color:#657786;width:72px;margin:0 auto 24px" viewBox="0 0 24 24"><g><circle cx="12.025" cy="16.437" r="1.281"></circle><path d="M14.39 7.194a.495.495 0 0 0-.4-.2h-3.928a.494.494 0 0 0-.4.2.496.496 0 0 0-.08.442l1.814 6.098a.5.5 0 0 0 .48.357h.298a.501.501 0 0 0 .48-.356l1.813-6.098a.495.495 0 0 0-.077-.442z"></path><path d="M12 22.75C6.072 22.75 1.25 17.928 1.25 12S6.072 1.25 12 1.25 22.75 6.072 22.75 12 17.928 22.75 12 22.75zm0-20C6.9 2.75 2.75 6.9 2.75 12S6.9 21.25 12 21.25s9.25-4.15 9.25-9.25S17.1 2.75 12 2.75z"></path></g></svg><p>A problem was encountered trying to load the page.</p><a href="javascript:reloadFromScriptError()" style="background-color:#1DA1F2;border-radius:0.3em;border:0;padding:0.5em 1em;height:1.5em;display:inline-block;margin:1em 0;color:#fff;text-decoration:none;font-weight:bold"><svg style="display:inline-block;fill:currentcolor;height:1.5em;max-width:100%;position:relative;user-select:none;vertical-align:middle" viewBox="0 0 24 24"><g><path d="M12 2C6.486 2 2 6.486 2 12a.75.75 0 0 0 1.5 0c0-4.687 3.813-8.5 8.5-8.5s8.5 3.813 8.5 8.5-3.813 8.5-8.5 8.5c-2.886 0-5.576-1.5-7.13-3.888l2.983.55a.75.75 0 1 0 .274-1.474l-4.663-.86a.746.746 0 0 0-.88.647l-.57 4.706a.749.749 0 1 0 1.488.181l.32-2.63C5.673 20.36 8.728 22 12 22c5.514 0 10-4.486 10-10S17.514 2 12 2z"></path></g></svg> Retry</a></div></div></div></div>
<script>
window.__INITIAL_STATE__ = {"optimist":[],"toasts":[],"entities":{"users":{"entities":{},"errors":{},"fetchStatus":{}}},"session":{"country":"XX","guestId":"152763591683681529","language":"en","oneFactorLoginEligibility":{"fetchStatus":"none"}},"analytics":{},"featureSwitch":{"config":{"account_country_setting_countries_whitelist":{"value":["ad","ae","af","ag","ai","al","am","ao","ar","as","at","au","aw","ax","az","ba","bb","bd","be","bf","bg","bh","bi","bj","bl","bm","bn","bo","bq","br","bs","bt","bv","bw","by","bz","ca","cc","cd","cf","cg","ch","ci","ck","cl","cm","co","cr","cu","cv","cw","cx","cy","cz","de","dj","dk","dm","do","dz","ec","ee","eg","er","es","et","fi","fj","fk","fm","fo","fr","ga","gb","gd","ge","gf","gg","gh","gi","gl","gm","gn","gp","gq","gr","gs","gt","gu","gw","gy","hk","hn","hr","ht","hu","id","ie","il","im","in","io","iq","ir","is","it","je","jm","jo","jp","ke","kg","kh","ki","km","kn","kr","kw","ky","kz","la","lb","lc","li","lk","lr","ls","lt","lu","lv","ly","ma","mc","md","me","mf","mg","mh","mk","ml","mn","mo","mp","mq","mr","ms","mt","mu","mv","mw","mx","my","mz","na","nc","ne","nf","ng","ni","nl","no","np","nr","nu","nz","om","pa","pe","pf","pg","ph","pk","pl","pm","pn","pr","ps","pt","pw","py","qa","re","ro","rs","ru","rw","sa","sb","sc","se","sg","sh","si","sk","sl","sm","sn","so","sr","st","sv","sx","sz","tc","td","tf","tg","th","tj","tk","tl","tm","tn","to","tr","tt","tv","tw","tz","ua","ug","us","uy","uz","va","vc","ve","vi","vn","vu","wf","ws","xk","ye","yt","za","zm","zw"]},"live_event_hero_description_fields_enabled":{"value":true},"live_event_hero_ugm_attribution_enabled":{"value":false},"live_event_timeline_default_refresh_rate_interval_seconds":{"value":30},"live_event_timeline_minimum_refresh_rate_interval_seconds":{"value":10},"live_event_timeline_server_controlled_refresh_rate_enabled":{"value":true},"moment_annotations_enabled":{"value":false},"responsive_web_allow_switch_to_ms":{"value":false},"responsive_web_birthdays_enabled":{"value":false},"responsive_web_broadcasts_page_enabled":{"value":false},"responsive_web_composer_v2_enabled":{"value":false},"responsive_web_composer_v2_modal_compose_enabled":{"value":false},"responsive_web_desktop_bookmarks_enabled":{"value":false},"responsive_web_dm_livepipeline_enabled":{"value":false},"responsive_web_dm_reporting_enabled":{"value":false},"responsive_web_dm_typing_indicator_enabled":{"value":false},"responsive_web_eu_countries":{"value":["at","be","bg","ch","cy","cz","de","dk","ee","es","fi","fr","gb","gr","hr","hu","ie","is","it","li","lt","lu","lv","mt","nl","no","pl","pt","ro","se","si","sk"]},"responsive_web_event_card_enabled":{"value":false},"responsive_web_explore_feedback_actions_enabled":{"value":false},"responsive_web_feedback_link":{"value":""},"responsive_web_fetch_hashflags_on_boot":{"value":true},"responsive_web_gdpr_age_gate":{"value":true},"responsive_web_gdpr_twitter_archive":{"value":false},"responsive_web_gdpr_periscope_archive":{"value":false},"responsive_web_gdpr_logged_out_banner":{"value":true},"responsive_web_gdpr_change_country_ocf_flow":{"value":true},"responsive_web_graphql_verify_credentials_enabled":{"value":false},"responsive_web_graphql_verify_credentials_server_enabled":{"value":false},"responsive_web_htl_compose_prompt":{"value":false},"responsive_web_inline_video_player_enabled":{"value":false},"responsive_web_microsoft_jump_links":{"value":false},"responsive_web_ntab_verified_mentions_vit_internal_dogfood":{"value":false},"responsive_web_ocf_enabled":{"value":true},"responsive_web_report_page_not_found":{"value":false},"responsive_web_reported_tweet_tombstones_enabled":{"value":false},"responsive_web_search_filters_enabled":{"value":true},"responsive_web_settings_contacts_dashboard_enabled":{"value":false},"responsive_web_settings_email_notifications_enabled":{"value":true},"responsive_web_settings_facebook_connect_enabled":{"value":false},"responsive_web_settings_login_verification_enabled":{"value":false},"responsive_web_settings_notif_v2_push":{"value":true},"responsive_web_settings_notif_v2_sms":{"value":true},"responsive_web_settings_nsfw_user_enabled":{"value":true},"responsive_web_settings_password_applications_info_enabled":{"value":false},"responsive_web_settings_sessions_dashboard_enabled":{"value":false},"responsive_web_settings_u2f_security_key_enabled":{"value":false},"responsive_web_settings_trends_enabled":{"value":true},"responsive_web_transform_virtual_scroller":{"value":true},"responsive_web_tweet_source_timeline_enabled":{"value":false},"responsive_web_tweet_source_tweet_detail_enabled":{"value":false},"responsive_web_unified_cards":{"value":"no"},"responsive_web_urt_list_tweets_enabled":{"value":true},"responsive_web_urt_show_cover_enabled":{"value":false},"responsive_web_verification_v2_enabled":{"value":false},"responsive_web_windows_oauth_login":{"value":"auto_login"},"scribe_api_error_sample_size":{"value":0},"scribe_api_sample_size":{"value":100},"scribe_cdn_host_list":{"value":["si0.twimg.com","si1.twimg.com","si2.twimg.com","si3.twimg.com","a0.twimg.com","a1.twimg.com","a2.twimg.com","a3.twimg.com","abs.twimg.com","amp.twimg.com","o.twimg.com","pbs.twimg.com","pbs-eb.twimg.com","pbs-ec.twimg.com","pbs-v6.twimg.com","pbs-h1.twimg.com","pbs-h2.twimg.com","video.twimg.com","platform.twitter.com","cdn.api.twitter.com","ton.twimg.com","v.cdn.vine.co","mtc.cdn.vine.co","edge.vncdn.co","mid.vncdn.co"]},"scribe_cdn_sample_size":{"value":50},"scribe_enabled":{"value":true},"traffic_redirect_5347_hostmap":{"value":[]},"user_display_name_max_limit":{"value":50},"responsive_web_logged_out_homepage_6952":{"value":"treatment"},"responsive_web_night_mode_7836":{"value":"enabled"},"responsive_web_smart_lock_7159":{"value":"all"}},"impressions":{},"featureSetToken":"9dee8851ebe378f146b9ea25f610a73259e043ba","isLoaded":true,"isLoading":false,"keysRead":{},"settingsVersion":"ac9b6bc39ab73f6a95ad5c2c5765b7d5"},"typeaheadUsers":{"fetchStatus":"none","users":{},"blacklist":{},"lastUpdated":0,"index":{}},"blockedUsers":{"userIds":[],"fetchStatus":"none"},"settings":{"local":{"nextPushCheckin":0,"shouldAutoPlayGif":false},"remote":{"settings":{"display_sensitive_media":false},"fetchStatus":"none"},"dataSaver":{},"transient":{"dtabBarInfo":{"dtabAll":null,"dtabRweb":null,"hide":false},"loginPromptShown":false}},"devices":{"browserPush":{"fetchStatus":"none","pushNotificationsPrompt":{"count":0,"dismissed":false,"fetchStatus":"none"},"subscribed":false,"supported":null},"devices":{"data":{"emails":[],"phone_numbers":[]},"fetchStatus":"none"},"notificationSettings":{"_pushSettings":{},"_pushSettingsTemplate":{},"_smsSettings":{"error":null,"fetchStatus":"none"},"_smsSettingsTemplate":{}}}};
window.__META_DATA__ = {"env":"prod","isLoggedIn":false,"isRTL":false};
</script>
<script>
document.cookie = decodeURIComponent("gt=1001603747062124544; Max-Age=10800; Domain=.twitter.com; Path=/");
</script>
<script>
window.webpackChunkManifest = {"bundle.AccessInterstitial":"bundle.AccessInterstitial.0a886e55d07fd9d1.js","loader.DashMenu":"loader.DashMenu.f41af6a8aed64560.js","loader.SearchBox":"loader.SearchBox.f33c1d8a9d2f0779.js","loader.WideLayout":"loader.WideLayout.0e65294f86f6c39b.js","loader.PushNotificationsPrompt":"loader.PushNotificationsPrompt.c31086d53bc3ae52.js","bundle.Account":"bundle.Account.0d737ba8cea6cc52.js","bundle.Bookmarks":"bundle.Bookmarks.9ddaf670ca48ccc1.js","loader.AppModules":"loader.AppModules.54a034314566236b.js","loader.BroadcastCard":"loader.BroadcastCard.812462a4ae302154.js","loader.BroadcastPlayer":"loader.BroadcastPlayer.ed281794d61fd3ef.js","loader.VideoPlayer":"loader.VideoPlayer.8ef5e67114d25f39.js","hls.js":"hls.js.bf79fb6a401c797b.js","loader.EntryTombstone":"loader.EntryTombstone.0e0a5df360bd3f79.js","loader.FeedbackSheet":"loader.FeedbackSheet.2e826a17f3a963e0.js","loader.SignupModule":"loader.SignupModule.b458c376fac2e64c.js","loader.TimelineGap":"loader.TimelineGap.d8413106cdcee4e3.js","loader.Trends":"loader.Trends.3b820fa3ca99729b.js","loader.TweetCurationActionSheet":"loader.TweetCurationActionSheet.a8ce51a07d0ef679.js","loader.TweetPhotos":"loader.TweetPhotos.0a542d35ee03636c.js","loader.UnifiedCard":"loader.UnifiedCard.faf7b0d5d441aa83.js","ondemand.InlinePlayer":"ondemand.InlinePlayer.8b36fc73ffde364e.js","ondemand.AccessInterstitial":"ondemand.AccessInterstitial.bbe69b7e2f0224f8.js","bundle.Broadcast":"bundle.Broadcast.eeee777949be1b78.js","bundle.Collection":"bundle.Collection.177ed6760829194f.js","bundle.Compose":"bundle.Compose.3513c1ab3c6aca9f.js","ondemand.MicrosoftInterface":"ondemand.MicrosoftInterface.b114bf15c8508df4.js","bundle.ComposeV2":"bundle.ComposeV2.1e157c35b6434dcb.js","bundle.Conversation":"bundle.Conversation.ac83b1b8865b8ef1.js","bundle.ConversationParticipants":"bundle.ConversationParticipants.1dd98c6532c8d394.js","bundle.CredentialsPicker":"bundle.CredentialsPicker.b1e7a33cdc26f1ac.js","bundle.DirectMessages":"bundle.DirectMessages.f5c056761af48bd2.js","bundle.Download":"bundle.Download.f76c3abf3976b2d6.js","bundle.Explore":"bundle.Explore.357bc5750c112cdb.js","bundle.FollowerRequests":"bundle.FollowerRequests.5639c30d710f4035.js","bundle.FoundMedia":"bundle.FoundMedia.a8dc044ba72a1267.js","bundle.GenericTimeline":"bundle.GenericTimeline.865e036862a39389.js","bundle.Highlights":"bundle.Highlights.dcbcc3a9740565dc.js","bundle.HomeTimeline":"bundle.HomeTimeline.bb1cf156aecd8221.js","bundle.LiveEvent":"bundle.LiveEvent.44b0f8690a218e4b.js","bundle.LoggedOutHome":"bundle.LoggedOutHome.7cebe2d639e6baac.js","bundle.LoggedOutHomeV2":"bundle.LoggedOutHomeV2.aba2615d7d501496.js","bundle.Login":"bundle.Login.c040f59c0b88e5ca.js","bundle.Moment":"bundle.Moment.b5633aa8e8e2dd6c.js","bundle.NetworkInstrument":"bundle.NetworkInstrument.a64aa24912a1ea7d.js","bundle.NotificationDetail":"bundle.NotificationDetail.8626735cc1c716e4.js","bundle.Notifications":"bundle.Notifications.572bd2c0db1af32b.js","bundle.Ocf":"bundle.Ocf.43c617118e85cc40.js","bundle.Report":"bundle.Report.83a2bfca7c7f8d61.js","bundle.RichTextCompose":"bundle.RichTextCompose.33b589c3fbe06d23.js","bundle.Search":"bundle.Search.78ed436e150afbce.js","bundle.Settings":"bundle.Settings.5d9a181a3b3d62fa.js","bundle.SettingsInternals":"bundle.SettingsInternals.d70c97d527dca3b0.js","ondemand.countries-ar":"ondemand.countries-ar.2fb9a1ec172f64d5.js","ondemand.countries-bg":"ondemand.countries-bg.e35c4de062b13441.js","ondemand.countries-bn":"ondemand.countries-bn.fdc22fd7449c8e5e.js","ondemand.countries-ca":"ondemand.countries-ca.0c5f5fd3a6a19cd2.js","ondemand.countries-cs":"ondemand.countries-cs.1f9e5d57c71ac33b.js","ondemand.countries-da":"ondemand.countries-da.ee1520cf42140f35.js","ondemand.countries-de":"ondemand.countries-de.1c92da67cba0f7ef.js","ondemand.countries-el":"ondemand.countries-el.f6fed22f9c4b446a.js","ondemand.countries-en":"ondemand.countries-en.4340fa379bfcd96c.js","ondemand.countries-en-GB":"ondemand.countries-en-GB.0ddb82ed14613896.js","ondemand.countries-es":"ondemand.countries-es.a81afc755f1b2715.js","ondemand.countries-eu":"ondemand.countries-eu.52e072afc9fd8d81.js","ondemand.countries-fa":"ondemand.countries-fa.d39b25a43b80d4e1.js","ondemand.countries-fi":"ondemand.countries-fi.8619573e4c56f488.js","ondemand.countries-fil":"ondemand.countries-fil.86108f2daf1ac62f.js","ondemand.countries-fr":"ondemand.countries-fr.3ad8cb5c65a08975.js","ondemand.countries-ga":"ondemand.countries-ga.21cabd7b5f5fd189.js","ondemand.countries-gl":"ondemand.countries-gl.4e5f386bddc27df1.js","ondemand.countries-gu":"ondemand.countries-gu.a0ea66135f7a5b0e.js","ondemand.countries-he":"ondemand.countries-he.550a1106d3083b7d.js","ondemand.countries-hi":"ondemand.countries-hi.b65f7e584a50308e.js","ondemand.countries-hr":"ondemand.countries-hr.95b04f3a9334fe47.js","ondemand.countries-hu":"ondemand.countries-hu.80376d4679383ea3.js","ondemand.countries-id":"ondemand.countries-id.8e2c5175cd66aa65.js","ondemand.countries-it":"ondemand.countries-it.e9a06233169f19b2.js","ondemand.countries-ja":"ondemand.countries-ja.f766fe1300c540a4.js","ondemand.countries-kn":"ondemand.countries-kn.c2e2138da8f0261e.js","ondemand.countries-ko":"ondemand.countries-ko.606f5d220279b29c.js","ondemand.countries-mr":"ondemand.countries-mr.2ac9b3be671767b0.js","ondemand.countries-ms":"ondemand.countries-ms.75b244ff50f9cb0a.js","ondemand.countries-nb":"ondemand.countries-nb.968cba5a0112717e.js","ondemand.countries-nl":"ondemand.countries-nl.8928a1c2a78f816d.js","ondemand.countries-pl":"ondemand.countries-pl.b93f0a3117b687a2.js","ondemand.countries-pt":"ondemand.countries-pt.1033685135644792.js","ondemand.countries-ro":"ondemand.countries-ro.5f9a77a9943738d1.js","ondemand.countries-ru":"ondemand.countries-ru.c523c152294ff17a.js","ondemand.countries-sk":"ondemand.countries-sk.e5ba5e5a30d63a26.js","ondemand.countries-sr":"ondemand.countries-sr.8436803b7ca45970.js","ondemand.countries-sv":"ondemand.countries-sv.d3d68607f006f364.js","ondemand.countries-ta":"ondemand.countries-ta.9924444018bd6fd6.js","ondemand.countries-th":"ondemand.countries-th.97d27d91f1e8b1f9.js","ondemand.countries-tr":"ondemand.countries-tr.5e21112431856457.js","ondemand.countries-uk":"ondemand.countries-uk.fae009d4afa54dbe.js","ondemand.countries-zh":"ondemand.countries-zh.6ef733aab4c67996.js","ondemand.countries-zh-Hant":"ondemand.countries-zh-Hant.fea4ecfac68c55b0.js","bundle.SettingsProfile":"bundle.SettingsProfile.ab283edb8316db9e.js","bundle.SettingsTransparency":"bundle.SettingsTransparency.bfc5a5b693efdecb.js","bundle.SmsLogin":"bundle.SmsLogin.83386fa187f3751f.js","bundle.Stickers":"bundle.Stickers.44fa69df5bb970b2.js","bundle.Topics":"bundle.Topics.f031b702b5662037.js","bundle.Trends":"bundle.Trends.7d8fb7840bb1695d.js","bundle.TweetActivity":"bundle.TweetActivity.74d0c23f81ca5985.js","bundle.TweetMediaDetail":"bundle.TweetMediaDetail.8a914737d032edde.js","bundle.TweetMediaTags":"bundle.TweetMediaTags.7ac2030e881605c8.js","bundle.Twitterversary":"bundle.Twitterversary.cd6142decdce8223.js","bundle.UserAvatar":"bundle.UserAvatar.4ff8f77f9835ddbf.js","bundle.UserFollowLists":"bundle.UserFollowLists.761911bc2abed421.js","bundle.UserLists":"bundle.UserLists.7d92464eff372ac0.js","bundle.UserMoments":"bundle.UserMoments.2dd22a822fb08c4e.js","bundle.UserOnboarding":"bundle.UserOnboarding.dae2f08bc6ce1e63.js","bundle.UserProfile":"bundle.UserProfile.a7a78f92a1d13554.js","bundle.UserProfileTimelines":"bundle.UserProfileTimelines.44603ddd0d35316a.js","ondemand.ProfileSidebar":"ondemand.ProfileSidebar.59898236d7f06fe0.js","bundle.UserProfileSuspended":"bundle.UserProfileSuspended.d311eb8da64b059a.js","bundle.UserRedirect":"bundle.UserRedirect.870c393f79eed48a.js","bundle.shared.Compose":"bundle.shared.Compose.4cca3cba8d45cd5a.js","ondemand.SettingsInternals":"ondemand.SettingsInternals.7615b372ac35bdbb.js","shared":"shared.18a216f7e10b8cd4.js"};
</script>
<script>
window.showFailureMessage = function (source) { window.Raven && window.Raven.captureMessage( 'Failed to load source', { level: 'error', extra: { source: source } } ); document.getElementById('failureMessage').style.display = 'block'; }; window.reloadFromScriptError = function () { window.location.reload(); }; </script>
<script crossorigin="anonymous" onerror="showFailureMessage('https://abs-0.twimg.com/responsive-web/web/ltr/runtime.270230779ff8aae3.js');" src="https://abs-0.twimg.com/responsive-web/web/ltr/runtime.270230779ff8aae3.js"></script>
<script crossorigin="anonymous" onerror="showFailureMessage('https://abs-0.twimg.com/responsive-web/web/ltr/vendor.c66168a4671f3557.js');" src="https://abs-0.twimg.com/responsive-web/web/ltr/vendor.c66168a4671f3557.js"></script>
<script crossorigin="anonymous" onerror="showFailureMessage('https://abs-0.twimg.com/responsive-web/web/ltr/i18n/en.eeb53accf85c34c7.js');" src="https://abs-0.twimg.com/responsive-web/web/ltr/i18n/en.eeb53accf85c34c7.js"></script>
<script crossorigin="anonymous" onerror="showFailureMessage('https://abs-0.twimg.com/responsive-web/web/ltr/main.d53f6bdacc4ceca3.js');" src="https://abs-0.twimg.com/responsive-web/web/ltr/main.d53f6bdacc4ceca3.js"></script>
Does forcing the US to be Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36
also help?
@taspinar setting that as my HEADER_LIST seems to have worked.
Worked all day yesterday, but stopped working again this morning.
@bengarvey still working for me. What's your query? What's your error?
No results returned for even the simplest query using the HEADER_LIST you posted
twitterscraper Trump --limit 100
Seems you've had it stop working for you, then start working again multiple times.
Is there some common factor in those times you've had it not work for you? Can you try running from the cloud?
I'm getting results for your query btw.
Not sue. It worked for a few days when I started using Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36
for the HEADER_LIST, but stopped suddenly this morning.
I made a tiny change to the header and it's working again. I think my header/IP was blocked, maybe?
Interesting. Could you post the response.text
when you run it with the bad header? Perhaps there's some header-changing logic which can be applied based on detection of this specific failure by looking at response.text
to fix this.
I believe that most of the 'not being able to get all tweets' issues are caused by the useragent provided by fake_useragent. I have removed fake_useragent as an dependency all together. See PR https://github.com/taspinar/twitterscraper/pull/119
Once this PR is merged, the newest versions should no longer have these issues.
Can you guys have a look at the PR?
good day, I have the following problem. Now update to version 0.7.0.1 and execute the following query:
twitterscraper Trump -l 100 -bd 2017-01-01 -ed 2017-06-01 -o tweets2.json
It throws me the following errors:
Traceback (most recent call last):
File "/webapp/1-ProyectoDjango/ScraperEntorno/Entorno/bin/twitterscraper", line 7, in
Yesterday I was bringing Tweets and today not, so update to the new version.
@CamiloVeloz this is due to my pull request here https://github.com/taspinar/twitterscraper/pull/117. I didn't realize it was incompatible with python2. You can fix it by installing python3, or reverting the update.
@taspinar is this project intended to retain python2 compatability? If so, I could fix it so it works with python2.
@CamiloVeloz I fixed the python 3 problem like this...
First I uninstalled the old version
sudo pip uninstall twitterscraper
Then I got the new one going like so...
cd /home/me/myinstalldirectory
git clone https://github.com/taspinar/twitterscraper.git
cd twitterscraper
sudo python3 setup.py install
Having run my test search twitterscraper trump -bd 2018-05-23 -ed 2018-05-24 --output=tweets-01.json
which before this fix only returned 675 tweets, I got 1481 this time but the process died with this error
INFO: Querying trump since:2018-05-23 until:2018-05-24
ERROR: Failed to parse JSON "Expecting value: line 1 column 1 (char 0)" while requesting "https://twitter.com/i/search/timeline?f=tweets&vertical=default&include_available_features=1&include_entities=1&reset_error_state=false&src=typd&max_position=TWEET-999439049721970688-999439834686087168&q=trump%20since%3A2018-05-23%20until%3A2018-05-24&l=None"
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/twitterscraper-0.7.1-py3.5.egg/twitterscraper/query.py", line 53, in query_single_page
json_resp = json.loads(response.text)
File "/usr/lib/python3.5/json/__init__.py", line 319, in loads
return _default_decoder.decode(s)
File "/usr/lib/python3.5/json/decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python3.5/json/decoder.py", line 357, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
INFO: Got 1481 tweets for trump%20since%3A2018-05-23%20until%3A2018-05-24.
Hmm, you should have gotten a log line saying Retrying... (Attempts left attemptsleffhere)
as per https://github.com/taspinar/twitterscraper/blob/master/twitterscraper/query.py#L81
It should retry 10 times by default. I'm assuming you didn't omit any log lines.
Can you check whether you are on latest master?
I may not have upgraded successfully on my test machine but on my live machines, it's working much much better...
@lapp0 , If it is not too much trouble...
@taspinar done https://github.com/taspinar/twitterscraper/pull/123
@taspinar this should be closed
Hi @taspinar,
I don't know if you have received any messages regarding this error lately, but I have been having close to the same issues as stated above. I was going to try and manually input the user agent, but as I can understand the variable has been changed in a later version.
I have created a pastebin with the results by running a command from the documentation. I have removed duplicates as the original paste succeeded the free trial limits of PasteBin.
Link https://pastebin.com/raw/bkyAgGsK
Best regards
Update: I've tried changing my IP, knowing that Twitter has some odd ways of blocking. Seemingly it changed the amount of tweets I have been receiving, however, after a short while I still end up getting 0.
INFO: Got 120 tweets (120 new). INFO: Got 240 tweets (120 new). INFO: Got 360 tweets (120 new). INFO: Got 480 tweets (120 new). INFO: Got 600 tweets (120 new). INFO: Got 720 tweets (120 new). INFO: Got 720 tweets (0 new). INFO: Got 720 tweets (0 new). INFO: Got 720 tweets (0 new). INFO: Got 720 tweets (0 new). INFO: Got 720 tweets (0 new). INFO: Got 720 tweets (0 new). INFO: Got 720 tweets (0 new). INFO: Got 720 tweets (0 new). INFO: Got 720 tweets (0 new). INFO: Got 720 tweets (0 new). INFO: Got 720 tweets (0 new).
Update: Tested on two devices running 1809 WIN 10. No difference. Also tried running with limits/no limits, adding additional poolsize/running without set pool size. My main issue with this is not so much that I do not get a large enough data pool, but that the datapool is scattered extremely between dates. For instance, given a dataset of 100.000 tweets over 100 days I will have 3 days of 33.000 tweets and 97 days of nothing.
@bengarvey still working for me. What's your query? What's your error?
hello, I am facing some issue as I'm getting 0 tweets. I cloned this https://github.com/lapp0/twitterscraper.git and run python setup.py install but it shows 0 tweets. Please HELP!. Thanks
@bengarvey still working for me. What's your query? What's your error?
@lubhaniagarwal I think you should use taspinars branch, mine has last been updated in 2018, and the changes were already merged into taspinars.
Hey, Can you please help me with steps like how to proceed . It will really be helpful for me. Thank you in advance.
On Thu, Jun 4, 2020, 23:42 lapp0 notifications@github.com wrote:
@lubhaniagarwal https://github.com/lubhaniagarwal I think you should use taspinars branch, mine has last been updated in 2018.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/taspinar/twitterscraper/issues/115#issuecomment-639019339, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKUJE5CX24VWYTTFTXET3W3RU7PY5ANCNFSM4FBQ75DQ .
@lubhaniagarwal git clone https://github.com/taspinar/twitterscraper.git
ERROR: Failed to parse JSON "Expecting value: line 1 column 1 (char 0)" while requesting "https://twitter.com/i/search/timeline?f=tweets&vertical=default&include_available_features=1&include_entities=1&reset_error_state=false&src=typd&max_position=TWEET-838177224989753344-838177234682773505&q=trump%20since%3A2016-07-25%20until%3A2017-03-05&l=None"
I don't know why suddenly I'm getting into this problem.