sebinsua / scrape-twitter

🐩 Access Twitter data without an API key. [DEPRECATED]
GNU General Public License v3.0
176 stars 36 forks source link

Invalid JSON when using "likes" function #18

Open belisards opened 6 years ago

belisards commented 6 years ago

I've configured my enviromental variables, but I'm getting this error, any idea?

invalid json response body at https://twitter.com/pmerj/likes/timeline?include_available_features=1&include_entities=1 reason: Unexpected token < FetchError: invalid json response body at https://twitter.com/pmerj/likes/timeline?include_available_features=1&include_entities=1 reason: Unexpected token < at /usr/local/lib/node_modules/scrape-twitter/node_modules/node-fetch/lib/body.js:48:31 at process._tickCallback (node.js:368:9)

jchook commented 5 years ago

I am also getting this issue:

invalid json response body at https://twitter.com/jchook/likes/timeline?include_available_features=1&include_entities=1 reason: Unexpected token < in JSON at position 0
FetchError: invalid json response body at https://twitter.com/jchook/likes/timeline?include_available_features=1&include_entities=1 reason: Unexpected token < in JSON at position 0
    at /usr/local/lib/node_modules/scrape-twitter/node_modules/node-fetch/lib/body.js:48:31
    at <anonymous>
    at process._tickCallback (internal/process/next_tick.js:188:7)

    Access Twitter data without an API key.

When looking at the json and the code, it seems the Twitter internal JSON structure may have changed from { html, _minPosition } to { has_more_items, items_html, new_latent_count }.

rbeer commented 5 years ago

tl;dr Endpoint sends HTML instead of expected JSON, requesting to set a cookie app_shell_visited. Doing so only sends yet another page with a link for the user to click. This is fixed by setting the Referer header of the request to the requested resource (i.e. the URL), fully simulating the behavior requested by the first response.


The endpoint most likely performs a sanity check on incoming requests, since they are expected to originate from within the Twitter frontend (i.e. a Progressive Web App).

AppShell is a PWA concept. More on that in the always great MDN: https://developer.mozilla.org/en-US/docs/Web/Apps/Progressive/App_structure#App_shell

Here's the first response:

<!DOCTYPE html>
<html lang="en">
<head>
... Just some styles, nothing important ...
</head>
<body>
    <noscript>
      <center>If you’re not redirected soon, please <a href="/nouswavesle/likes/time
line?include_available_features=1&amp;include_entities=1">use this link</a>.</center
>
    </noscript>
    <script nonce="CrxHUbZqtQTnttFBuO8J6A==">
      document.cookie = "app_shell_visited=1;path=/;max-age=5";
      location.replace(location.href.split("#")[0]);
    </script>
</body>
</html>

The <script> sets the app_shell_visited=1 cookie and immediately reloads the page (location.replace with same URL). Setting it doesn't cut it, though. All we're getting then is yet another HTML page:

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="utf-8" />
  <title>Twitter</title>
... again some styles ...
  </style>
</head>
<body>

    <center>
      <svg viewBox="0 0 24 24"><g><path d="...actual coordinates that draw the Twitter birdy, I guess..."></path></g></svg>
      If you’re not redirected soon, please <a href="/nouswavesle/likes/timeline?inc
lude_available_features=1&amp;include_entities=1">use this link</a>.
    </center>
</body>
</html>

This time, it's not asking for the cookie; so that seems to work. But we're still asked to click a link with the same URL as requested. Since cookies that might have been set by the response are tracked automatically, there wasn't much left than the Referer header, which is being set to the current URL when you click a link.

Well, let's try that:

09:03:54 scrape-twitter:query query on resource: https://twitter.com/nouswaves/likes/timeline?include_available_features=1&include_en
tities=1
09:03:54 scrape-twitter:query response was ok
09:03:54 scrape-twitter:query received html of length: 217039
[
{"screenName":"anfiyj","id":"1058154950004523008","time":"2018-11-02T00:32:54.000Z","isRetweet":false,"isPinned":false,"isReplyTo":false,"text":"A SQ
UID is love","userMentions":[],"hashtags":[],"images":[],"urls":[],"replyCount":0,"retweetCount":4,"favoriteCount":4},
{"screenName":"recborg","id":"1054791420475772928","time":"2018-10-23T17:47:26.000Z","isRetweet":false,"isPinned":false,"isReplyTo":true,"text":"I’m
actually in NY. Was hoping you and Nabeel would meet but I don’t think his shedule permits.","userMentions":[],"hashtags":[],"images":[],"urls":[],"r
eplyCount":0,"retweetCount":0,"favoriteCount":1},
... and so on ...
]
...

:tada: :grin:

PR will be out in a minute.