Open belisards opened 6 years ago
I am also getting this issue:
invalid json response body at https://twitter.com/jchook/likes/timeline?include_available_features=1&include_entities=1 reason: Unexpected token < in JSON at position 0
FetchError: invalid json response body at https://twitter.com/jchook/likes/timeline?include_available_features=1&include_entities=1 reason: Unexpected token < in JSON at position 0
at /usr/local/lib/node_modules/scrape-twitter/node_modules/node-fetch/lib/body.js:48:31
at <anonymous>
at process._tickCallback (internal/process/next_tick.js:188:7)
Access Twitter data without an API key.
When looking at the json and the code, it seems the Twitter internal JSON structure may have changed from { html, _minPosition }
to { has_more_items, items_html, new_latent_count }
.
tl;dr
Endpoint sends HTML instead of expected JSON, requesting to set a cookie app_shell_visited
. Doing so only sends yet another page with a link for the user to click. This is fixed by setting the Referer
header of the request to the requested resource (i.e. the URL), fully simulating the behavior requested by the first response.
The endpoint most likely performs a sanity check on incoming requests, since they are expected to originate from within the Twitter frontend (i.e. a Progressive Web App).
AppShell is a PWA concept. More on that in the always great MDN: https://developer.mozilla.org/en-US/docs/Web/Apps/Progressive/App_structure#App_shell
Here's the first response:
<!DOCTYPE html>
<html lang="en">
<head>
... Just some styles, nothing important ...
</head>
<body>
<noscript>
<center>If youâre not redirected soon, please <a href="/nouswavesle/likes/time
line?include_available_features=1&include_entities=1">use this link</a>.</center
>
</noscript>
<script nonce="CrxHUbZqtQTnttFBuO8J6A==">
document.cookie = "app_shell_visited=1;path=/;max-age=5";
location.replace(location.href.split("#")[0]);
</script>
</body>
</html>
The <script>
sets the app_shell_visited=1
cookie and immediately reloads the page (location.replace with same URL).
Setting it doesn't cut it, though. All we're getting then is yet another HTML page:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8" />
<title>Twitter</title>
... again some styles ...
</style>
</head>
<body>
<center>
<svg viewBox="0 0 24 24"><g><path d="...actual coordinates that draw the Twitter birdy, I guess..."></path></g></svg>
If youâre not redirected soon, please <a href="/nouswavesle/likes/timeline?inc
lude_available_features=1&include_entities=1">use this link</a>.
</center>
</body>
</html>
This time, it's not asking for the cookie; so that seems to work. But we're still asked to click a link with the same URL as requested. Since cookies that might have been set by the response are tracked automatically, there wasn't much left than the Referer
header, which is being set to the current URL when you click a link.
Well, let's try that:
09:03:54 scrape-twitter:query query on resource: https://twitter.com/nouswaves/likes/timeline?include_available_features=1&include_en
tities=1
09:03:54 scrape-twitter:query response was ok
09:03:54 scrape-twitter:query received html of length: 217039
[
{"screenName":"anfiyj","id":"1058154950004523008","time":"2018-11-02T00:32:54.000Z","isRetweet":false,"isPinned":false,"isReplyTo":false,"text":"A SQ
UID is love","userMentions":[],"hashtags":[],"images":[],"urls":[],"replyCount":0,"retweetCount":4,"favoriteCount":4},
{"screenName":"recborg","id":"1054791420475772928","time":"2018-10-23T17:47:26.000Z","isRetweet":false,"isPinned":false,"isReplyTo":true,"text":"Iâm
actually in NY. Was hoping you and Nabeel would meet but I donât think his shedule permits.","userMentions":[],"hashtags":[],"images":[],"urls":[],"r
eplyCount":0,"retweetCount":0,"favoriteCount":1},
... and so on ...
]
...
:tada: :grin:
PR will be out in a minute.
I've configured my enviromental variables, but I'm getting this error, any idea?
invalid json response body at https://twitter.com/pmerj/likes/timeline?include_available_features=1&include_entities=1 reason: Unexpected token < FetchError: invalid json response body at https://twitter.com/pmerj/likes/timeline?include_available_features=1&include_entities=1 reason: Unexpected token < at /usr/local/lib/node_modules/scrape-twitter/node_modules/node-fetch/lib/body.js:48:31 at process._tickCallback (node.js:368:9)