snarfed / bridgy

📣 Connects your web site to social media. Likes, retweets, mentions, cross-posting, and more...
https://brid.gy
Creative Commons Zero v1.0 Universal
723 stars 52 forks source link

twitter html changes broke scraping for likes #823

Closed snarfed closed 6 years ago

snarfed commented 6 years ago

whee! haven't investigated yet, just seeing that the URLs we currently scrape to get likes/favorites, eg https://twitter.com/i/activity/favorited_popup?id=998777664868532224 , now 404. ARGH.

snarfed commented 6 years ago

IRC discussion today, but focused on losing rel-me links on profiles pages.

armingrewe commented 6 years ago

Ah, that's why none of my Twitter likes are coming through. Why can't they just leave things as they are?

snarfed commented 6 years ago

more research, based on viewing https://twitter.com/schnarfed/status/627581493137637376 while logged out:

facepile in the html includes both retweets and likes, and doesn't distinguish between them:

<li class="avatar-row js-face-pile-container">
<a class="js-profile-popup-actionable js-tooltip" href="/kn" data-user-id="260959249" original-title="Katsuya ..." title="Katsuya ..." rel="noopener">
  <img class="avatar size24 js-user-profile-link" src="https://pbs.twimg.com/profile_images/878856067354050560/l_Plr-B4_normal.jpg" alt="Katsuya ...">
</a>
<a class="js-profile-popup-actionable js-tooltip" href="/energyovertime" data-user-id="2946733108" original-title="Bradley ..." title="Bradley ..." rel="noopener">
  <img class="avatar size24 js-user-profile-link" src="https://pbs.twimg.com/profile_images/997266095101759489/Lhiigdp1_normal.jpg" alt="Bradley ...">
</a>
...

when i click on "N retweets" or "M likes," it shows a "please log in" popup. it also fetches https://twitter.com/i/schnarfed/conversation/627581493137637376?include_available_features=1&include_entities=1&reset_error_state=false , which returns nothing:

{
  "has_more_items": false,
  "items_html": "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n",
  "new_latent_count": 0
}

no requests to https://twitter.com/i/activity/favorited_popup?id=998777664868532224 or anything else similar.

snarfed commented 6 years ago

looking at https://mobile.twitter.com/schnarfed/status/627581493137637376 , it's JS;DR, nothing in the html.

there's a bunch of data from the tweet and reply(ies) in this fetch, but nothing about likes: https://api.twitter.com/2/timeline/conversation/627581493137637376.json?include_profile_interstitial_type=1&include_blocking=1&include_blocked_by=1&include_followed_by=1&include_want_retweets=1&include_mute_edge=1&include_can_dm=1&skip_status=1&cards_platform=Web-12&include_cards=1&include_ext_alt_text=true&include_reply_count=1&tweet_mode=extended&include_entities=true&include_user_entities=true&include_ext_media_color=true&send_error_codes=true&count=20&ext=mediaStats

clicking on "N likes" gives the same "please log in" popup. it also posts to https://api.twitter.com/1.1/jot/client_event.json with this body, and gets an empty response.

[
  {
    "_category_": "client_event",
    "format_version": 2,
    "triggered_on": 1527083527236,
    "items": [],
    "event_namespace": {
      "page": "permalink",
      "section": "permalink",
      "component": "login_signup_sheet",
      "action": "impression",
      "client": "m5"
    },
    "client_app_id": "3033300"
  }
]
snarfed commented 6 years ago

under the JS, the mobile page HTML says We've detected that JavaScript is disabled in your browser. Would you like to proceed to legacy Twitter?, with a form that POSTs to https://mobile.twitter.com/i/nojs_router?path=%2Fschnarfed%2Fstatus%2F627581493137637376 .

that POST 302 redirects back to https://mobile.twitter.com/schnarfed/status/627581493137637376 , but sets something persistent in the backend session, probably identified by the _mobile_sess or _twitter_sess cookie, so that legacy twitter loads instead of JS mobile twitter.

the legacy tweet HTML doesn't show any retweets or likes at all, and makes no AJAX fetches for anything else.

snarfed commented 6 years ago

https://bsidesoftware.uservoice.com/forums/222301-general/suggestions/6209460-add-a-way-to-view-who-rt-d-and-favorited-tweets led me to https://twitter.com/i/tweet/html?id=998777664868532224&modal=favorited_activity , which returns json with html in a tweet_html field...which simialrly only shows counts of retweets and likes, not individual users.

snarfed commented 6 years ago

found a fix! i have to spoof a browser by sending an appropriate User-Agent, eg Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:60.0) Gecko/20100101 Firefox/60.0. whee.

huge thanks to https://reext.ru/blog/?p=214 (google translate; maybe @CityAceE?), who discovered this.

CityAceE commented 6 years ago

It's my pleasure to know that my explanation was helpful. But based on my experience I think it will be better to send full header instead of User-Agent only.

snarfed commented 6 years ago

it's working again, without the User-Agent header! oh Twitter, never change. guess we'll see how long it lasts.

snarfed commented 6 years ago

example log: https://brid.gy/log?start_time=1527132332&key=aglzfmJyaWQtZ3lyFAsSB1R3aXR0ZXIiB2Fhcm9ucGsM

snarfed commented 6 years ago

still working now, as is, without User-Agent. tentatively closing.