itteco / iframely

oEmbed proxy. Supports over 1800 domains via custom parsers, oEmbed, Twitter Cards and Open Graph
https://iframely.com
Other
1.53k stars 302 forks source link

Tweet with media are failing in self hosted version #498

Closed j0k3r closed 1 year ago

j0k3r commented 1 year ago

I can't find when it started to fail (maybe a change at Twitter side) but when a tweet contains a media, the fetch is failing.

I've tested locally with http://0.0.0.0:8061/iframely?uri=https://twitter.com/RockoPeppe/status/582323285825736704 (using the tweet from inside twitter.status.js).

Here are log from the server:

$ node server
Using cache engine: no-cache
No local domains file detected...
-- [23-05-09 11:26:19]:16574 Loading domains list from https://iframely.com/qa/domains.json
-- [23-05-09 11:26:20]:16574 Domains list activated. 1950 domains, including disabled ones
Invalid mixin "ld-newsarticle-logo" in plugin "sverigesradio.se"
Invalid mixin "ld-author" in plugin "sverigesradio.se"
Invalid mixin "ld-date" in plugin "sverigesradio.se"
Iframely plugins loaded:
   - custom domains: 146
   - generic & meta: 95

Starting Iframely...
Base URL for embeds that require hosted renders: http://localhost:8061

 - support@iframely.com - if you need help
 - twitter.com/iframely - news & updates
 - github.com/itteco/iframely - star & contribute

iframely is running on 0.0.0.0:8061
API endpoints: /oembed and /iframely; Debugger UI: /debug

-- [23-05-09 11:26:22]:16574 127.0.0.1 - Loading /iframely for https://twitter.com/RockoPeppe/status/582323285825736704
-- [23-05-09 11:26:22]:16574    -- plugin redirect (by "htmlparser") /
-- [23-05-09 11:26:23]:16574    -- plugin response: {"plugin":"htmlparser","response":"maximum redirect reached at: https://twitter.com/RockoPeppe/status/582323285825736704","uri":"https://twitter.com/RockoPeppe/status/582323285825736704"}
Requested page error: FetchError: maximum redirect reached at: https://twitter.com/RockoPeppe/status/58232328582573670

And the response is:

{
  "error": {
    "source": "iframely",
    "code": 417,
    "message": "Requested page error: FetchError: maximum redirect reached at: https://twitter.com/RockoPeppe/status/582323285825736704"
  }
}

I tried to find the root issue but I failed. The only thing that fixed the problem, is commenting that line https://github.com/itteco/iframely/blob/c38710d632f9d61136d675d3a5cf35026b1c590f/plugins/domains/twitter.com/twitter.status.js#L72

iparamonau commented 1 year ago

The plugin is trying to get a thumbnail image if it sees that there is one in the tweet:

https://github.com/itteco/iframely/blob/c38710d632f9d61136d675d3a5cf35026b1c590f/plugins/domains/twitter.com/twitter.status.js#L70-L74

Around when Twitter announced their monetization attempt for their developer APIs couple months ago, they also added stricter bot restrictions, hence you probably get blocked. The options we set when we fetch the URL are actually there to prevent the plugin from failing on bot rejections.

When you remove options.followHTTPRedirect, the second option kicks in (exposeStatusCode), and basically you get a 3xx redirect code instead. You probably don't get a picture either.

The proper fix would be to disable that fetch altogether via a feature flag. But before we do, it would be interesting to understand what Twitter gives you. Could you perhaps try to cURL that URL and follow the redirection chain manually and see what happens?

j0k3r commented 1 year ago

Here is what I get when I fetch the pic.twitter URL:

* Preparing request to https://pic.twitter.com/3SGK87Xz2j
* Current time is 2023-05-11T07:16:30.317Z
* Enable automatic URL encoding
* Using default HTTP version
* Disable SSL validation
* Enable cookie sending with jar of 16 cookies
*   Trying 104.244.42.131:443...
* Connected to pic.twitter.com (104.244.42.131) port 443 (#10)
* ALPN, offering h2
* ALPN, offering http/1.1
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use h2
* Server certificate:
*  subject: C=US; ST=California; L=San Francisco; O=Twitter, Inc.; CN=*.twitter.com
*  start date: Oct  1 00:00:00 2022 GMT
*  expire date: Oct  1 23:59:59 2023 GMT
*  issuer: C=US; O=DigiCert Inc; CN=DigiCert TLS RSA SHA256 2020 CA1
*  SSL certificate verify ok.
* Using HTTP2, server supports multiplexing
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x7fbf5211dc00)

> GET /3SGK87Xz2j HTTP/2
> Host: pic.twitter.com
> user-agent: insomnia/2023.1.0
> accept: */*

* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* old SSL session ID is stale, removing

< HTTP/2 301 
< date: Thu, 11 May 2023 07:16:30 GMT
< perf: 7626143928
< vary: Origin
< server: tsa_f
< expires: Thu, 11 May 2023 07:21:30 GMT
< location: https://twitter.com/RockoPeppe/status/582323285825736704/photo/1

* skipped cookie with bad tailmatch domain: t.co

< set-cookie: muc=39a24e40-5d91-4fcc-8e81-f8e911f565ee; Max-Age=34214400; Expires=Mon, 10 Jun 2024 07:16:30 GMT; Domain=t.co; Secure; SameSite=None
< set-cookie: guest_id=v1%3A168378939055769957; Max-Age=34214400; Expires=Mon, 10 Jun 2024 07:16:30 GMT; Path=/; Domain=.twitter.com; Secure; SameSite=None
< cache-control: private,max-age=300
< content-length: 0
< x-transaction-id: 926e5921c18b3c41
< strict-transport-security: max-age=631138519
< x-response-time: 108
< x-connection-hash: b65db43e615e4f76b1d0b6a728f7e0e41b8f668dcd243d8bc59518453e796b05

* Connection #10 to host pic.twitter.com left intact
* Issue another request to this URL: 'https://twitter.com/RockoPeppe/status/582323285825736704/photo/1'
*   Trying 104.244.42.193:443...
* Connected to twitter.com (104.244.42.193) port 443 (#11)
* ALPN, offering h2
* ALPN, offering http/1.1
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use h2
* Server certificate:
*  subject: C=US; ST=California; L=San Francisco; O=Twitter, Inc.; CN=twitter.com
*  start date: Dec 25 00:00:00 2022 GMT
*  expire date: Dec 25 23:59:59 2023 GMT
*  issuer: C=US; O=DigiCert Inc; CN=DigiCert TLS RSA SHA256 2020 CA1
*  SSL certificate verify ok.
* Using HTTP2, server supports multiplexing
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x7fbf5211dc00)

> GET /RockoPeppe/status/582323285825736704/photo/1 HTTP/2
> Host: twitter.com
> user-agent: insomnia/2023.1.0
> cookie: guest_id=v1%3A168378939055769957
> accept: */*

* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* old SSL session ID is stale, removing

< HTTP/2 200 
< date: Thu, 11 May 2023 07:16:30 GMT
< perf: 7626143928
< expiry: Tue, 31 Mar 1981 05:00:00 GMT
< pragma: no-cache
< server: tsa_f
< set-cookie: ct0=; Max-Age=-1683789389; Expires=Thu, 01 Jan 1970 00:00:01 GMT; Path=/; Domain=.twitter.com; Secure; SameSite=Lax
< content-type: text/html; charset=utf-8
< x-powered-by: Express
< cache-control: no-cache, no-store, must-revalidate, pre-check=0, post-check=0
< last-modified: Thu, 11 May 2023 07:16:30 GMT
< x-frame-options: DENY
< x-transaction-id: c711532431cb9e9b
< x-xss-protection: 0
< x-content-type-options: nosniff
< content-security-policy: connect-src 'self' blob: https://*.pscp.tv https://*.video.pscp.tv https://*.twimg.com https://api.twitter.com https://api-stream.twitter.com https://ads-api.twitter.com https://aa.twitter.com https://caps.twitter.com https://pay.twitter.com https://sentry.io https://ton.twitter.com https://twitter.com https://upload.twitter.com https://www.google-analytics.com https://accounts.google.com/gsi/status https://accounts.google.com/gsi/log https://app.link https://api2.branch.io https://bnc.lt https://checkoutshopper-live.adyen.com wss://*.pscp.tv https://vmap.snappytv.com https://vmapstage.snappytv.com https://vmaprel.snappytv.com https://vmap.grabyo.com https://dhdsnappytv-vh.akamaihd.net https://pdhdsnappytv-vh.akamaihd.net https://mdhdsnappytv-vh.akamaihd.net https://mdhdsnappytv-vh.akamaihd.net https://mpdhdsnappytv-vh.akamaihd.net https://mmdhdsnappytv-vh.akamaihd.net https://mdhdsnappytv-vh.akamaihd.net https://mpdhdsnappytv-vh.akamaihd.net https://mmdhdsnappytv-vh.akamaihd.net https://dwo3ckksxlb0v.cloudfront.net https://media.riffsy.com https://*.giphy.com https://media.tenor.com https://c.tenor.com ; default-src 'self'; form-action 'self' https://twitter.com https://*.twitter.com; font-src 'self' https://*.twimg.com; frame-src 'self' https://twitter.com https://mobile.twitter.com https://pay.twitter.com https://cards-frame.twitter.com https://accounts.google.com/ https://client-api.arkoselabs.com/ https://iframe.arkoselabs.com/  https://recaptcha.net/recaptcha/ https://www.google.com/recaptcha/ https://www.gstatic.com/recaptcha/; img-src 'self' blob: data: https://*.cdn.twitter.com https://ton.twitter.com https://*.twimg.com https://analytics.twitter.com https://cm.g.doubleclick.net https://www.google-analytics.com https://maps.googleapis.com https://www.periscope.tv https://www.pscp.tv https://media.riffsy.com https://*.giphy.com https://media.tenor.com https://c.tenor.com https://*.pscp.tv https://*.periscope.tv https://prod-periscope-profile.s3-us-west-2.amazonaws.com https://platform-lookaside.fbsbx.com https://scontent.xx.fbcdn.net https://scontent-sea1-1.xx.fbcdn.net https://*.googleusercontent.com; manifest-src 'self'; media-src 'self' blob: https://twitter.com https://*.twimg.com https://*.vine.co https://*.pscp.tv https://*.video.pscp.tv https://dhdsnappytv-vh.akamaihd.net https://pdhdsnappytv-vh.akamaihd.net https://mdhdsnappytv-vh.akamaihd.net https://mdhdsnappytv-vh.akamaihd.net https://mpdhdsnappytv-vh.akamaihd.net https://mmdhdsnappytv-vh.akamaihd.net https://mdhdsnappytv-vh.akamaihd.net https://mpdhdsnappytv-vh.akamaihd.net https://mmdhdsnappytv-vh.akamaihd.net https://dwo3ckksxlb0v.cloudfront.net; object-src 'none'; script-src 'self' 'unsafe-inline' https://*.twimg.com https://recaptcha.net/recaptcha/ https://www.google.com/recaptcha/ https://www.gstatic.com/recaptcha/ https://client-api.arkoselabs.com/ https://www.google-analytics.com https://twitter.com https://app.link https://accounts.google.com/gsi/client https://appleid.cdn-apple.com/appleauth/static/jsapi/appleid/1/en_US/appleid.auth.js  'nonce-Y2I5ZTA0NWItZjVkOC00ODBjLWIyNTItOWMwMDVlN2I1N2Ew'; style-src 'self' 'unsafe-inline' https://accounts.google.com/gsi/style https://*.twimg.com; worker-src 'self' blob:; report-uri https://twitter.com/i/csp_report?a=O5RXE%3D%3D%3D&ro=false
< strict-transport-security: max-age=631138519
< cross-origin-opener-policy: same-origin-allow-popups
< cross-origin-embedder-policy: unsafe-none
< x-response-time: 118
< x-connection-hash: 061e7c5cf8e2dfb8ef30f41b88b4636218828b118415f7c422ae1341433a9460

* Received 874 B chunk
* Received 15.7 KB chunk
* Received 2.2 KB chunk
* Received 16 KB chunk
* Received 9 B chunk
* Received 16 KB chunk
* Received 9 B chunk
* Received 16 KB chunk
* Received 16 KB chunk
* Received 18 B chunk
* Received 16 KB chunk
* Received 16 KB chunk
* Received 16 KB chunk
* Received 16 KB chunk
* Received 42 B chunk
* Received 16 KB chunk
* Received 3.8 KB chunk
* Connection #11 to host twitter.com left intact

I only get one 301 and then a 200. I checked again without commenting options.followHTTPRedirect = true; and it seems it fails after the first redirect. It doesn't get endless redirect before failing (ie: the maximum redirect reached from the error message).

iparamonau commented 1 year ago

Thank you for the details. We actually could repeat similar issue with Twitter users and timelines. It is likely because of Fetch module that we currently use does not re-use cookies between redirects by the default. We have used our jars elsewhere in the code, but not for Twitter (due to followHTTPredirect bypass used).

The issue is resolved in #502. Also, we now have disabled Twitter thumbnail by default to avoid issues in the future. To enable, please configure twitter: {thumbnail: true, ...} in your providerOptions config.