stevenvachon / broken-link-checker

Find broken links, missing images, etc within your HTML.
MIT License
1.95k stars 302 forks source link

Twitter sniffs the user-agent and returns 400 #210

Closed jasikpark closed 3 years ago

jasikpark commented 3 years ago

Describe the bug A clear and concise description of what the bug is. I want to detect whether twitter links on my site are still valid, but twitter returns a 400 error for them because they sniff the user-agent and only respond 200 to an allowlist of browsers.

To Reproduce Steps to reproduce the behavior.

blc -r https://jasik.xyz - but you can honestly just run curl -v https://twitter.com/calebjasik

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 104.244.42.193...
* TCP_NODELAY set
* Connected to twitter.com (104.244.42.193) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/cert.pem
  CApath: none
* TLSv1.2 (OUT), TLS handshake, Client hello (1):
} [225 bytes data]
* TLSv1.2 (IN), TLS handshake, Server hello (2):
{ [66 bytes data]
* TLSv1.2 (IN), TLS handshake, Certificate (11):
{ [2871 bytes data]
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
{ [333 bytes data]
* TLSv1.2 (IN), TLS handshake, Server finished (14):
{ [4 bytes data]
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
} [70 bytes data]
* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
} [1 bytes data]
* TLSv1.2 (OUT), TLS handshake, Finished (20):
} [16 bytes data]
* TLSv1.2 (IN), TLS change cipher, Change cipher spec (1):
{ [1 bytes data]
* TLSv1.2 (IN), TLS handshake, Finished (20):
{ [16 bytes data]
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* ALPN, server accepted to use h2
* Server certificate:
*  subject: C=US; ST=California; L=San Francisco; O=Twitter, Inc.; OU=atla; CN=twitter.com
*  start date: Feb  6 00:00:00 2020 GMT
*  expire date: Feb  5 12:00:00 2021 GMT
*  subjectAltName: host "twitter.com" matched cert's "twitter.com"
*  issuer: C=US; O=DigiCert Inc; OU=www.digicert.com; CN=DigiCert SHA2 High Assurance Server CA
*  SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x7f9c4500ee00)
> GET /calebjasik HTTP/2
> Host: twitter.com
> User-Agent: curl/7.64.1
> Accept: */*
> 
* Connection state changed (MAX_CONCURRENT_STREAMS == 4294967295)!
< HTTP/2 400 
< cache-control: no-cache, no-store, must-revalidate, pre-check=0, post-check=0
< content-security-policy: connect-src 'self' blob: https://*.giphy.com https://*.pscp.tv https://*.video.pscp.tv https://*.twimg.com https://api.twitter.com https://api-stream.twitter.com https://ads-api.twitter.com https://caps.twitter.com https://media.riffsy.com https://pay.twitter.com https://sentry.io https://ton.twitter.com https://twitter.com https://upload.twitter.com https://www.google-analytics.com https://app.link https://api2.branch.io https://bnc.lt https://vmap.snappytv.com https://vmapstage.snappytv.com https://vmaprel.snappytv.com https://vmap.grabyo.com https://dhdsnappytv-vh.akamaihd.net https://pdhdsnappytv-vh.akamaihd.net https://mdhdsnappytv-vh.akamaihd.net https://mdhdsnappytv-vh.akamaihd.net https://mpdhdsnappytv-vh.akamaihd.net https://mmdhdsnappytv-vh.akamaihd.net https://mdhdsnappytv-vh.akamaihd.net https://mpdhdsnappytv-vh.akamaihd.net https://mmdhdsnappytv-vh.akamaihd.net https://dwo3ckksxlb0v.cloudfront.net ; default-src 'self'; form-action 'self' https://twitter.com https://*.twitter.com; font-src 'self' https://*.twimg.com; frame-src 'self' https://twitter.com https://mobile.twitter.com https://pay.twitter.com https://cards-frame.twitter.com ; img-src 'self' blob: data: https://*.cdn.twitter.com https://ton.twitter.com https://*.twimg.com https://analytics.twitter.com https://cm.g.doubleclick.net https://www.google-analytics.com https://www.periscope.tv https://www.pscp.tv https://media.riffsy.com https://*.giphy.com https://*.pscp.tv https://prod-periscope-profile.*.amazonaws.com https://platform-lookaside.fbsbx.com https://scontent.xx.fbcdn.net https://*.googleusercontent.com; manifest-src 'self'; media-src 'self' blob: https://twitter.com https://*.twimg.com https://*.vine.co https://*.pscp.tv https://*.video.pscp.tv https://*.giphy.com https://media.riffsy.com https://dhdsnappytv-vh.akamaihd.net https://pdhdsnappytv-vh.akamaihd.net https://mdhdsnappytv-vh.akamaihd.net https://mdhdsnappytv-vh.akamaihd.net https://mpdhdsnappytv-vh.akamaihd.net https://mmdhdsnappytv-vh.akamaihd.net https://mdhdsnappytv-vh.akamaihd.net https://mpdhdsnappytv-vh.akamaihd.net https://mmdhdsnappytv-vh.akamaihd.net https://dwo3ckksxlb0v.cloudfront.net; object-src 'none'; script-src 'self' 'unsafe-inline' https://*.twimg.com   https://www.google-analytics.com https://twitter.com https://app.link  'nonce-NTI1MjExODMtMWM3Yy00NzBjLTliOWItMTY0ZmM3ZTYwODU4'; style-src 'self' 'unsafe-inline' https://*.twimg.com; worker-src 'self' blob:; report-uri https://twitter.com/i/csp_report?a=O5RXE%3D%3D%3D&ro=false
< content-type: text/html; charset=utf-8
< cross-origin-opener-policy: same-origin
< date: Sat, 09 Jan 2021 19:44:58 GMT
< expiry: Tue, 31 Mar 1981 05:00:00 GMT
< last-modified: Sat, 09 Jan 2021 19:44:58 GMT
< pragma: no-cache
< server: tsa_b
< set-cookie: personalization_id="v1_U7XbBBfv3A+UQsocar+nmw=="; Max-Age=63072000; Expires=Mon, 09 Jan 2023 19:44:58 GMT; Path=/; Domain=.twitter.com; Secure; SameSite=None
< set-cookie: guest_id=v1%3A161022149800528314; Max-Age=63072000; Expires=Mon, 09 Jan 2023 19:44:58 GMT; Path=/; Domain=.twitter.com; Secure; SameSite=None
< strict-transport-security: max-age=631138519
< x-connection-hash: 46b8b2bce18af5e0b64bba68ccf3ba50
< x-content-type-options: nosniff
< x-frame-options: DENY
< x-powered-by: Express
< x-response-time: 21
< x-xss-protection: 0
< 
{ [2251 bytes data]
100  2251    0  2251    0     0   8526      0 --:--:-- --:--:-- --:--:--  8526
* Connection #0 to host twitter.com left intact
* Closing connection 0
calebjasik@Calebs-MacBook-Pro ~ % >....                                                                                                                                
    .errorFooter {
      color: #657786;
      font-size: 80%;
      line-height: 1.5;
      padding: 1em 0;
    }

    .errorFooter a,
    .errorFooter a:visited {
      color: #657786;
      text-decoration: none;
      padding-right: 1em;
    }

    .errorFooter a:hover,
    .errorFooter a:active {
      text-decoration: underline;
    }
  </style>
</head>
<body>
  <div class="errorContainer">
    <img width="46" height="38"
      srcset="https://abs.twimg.com/errors/logo46x38.png 1x, https://abs.twimg.com/errors/logo46x38@2x.png 2x"
      src="https://abs.twimg.com/errors/logo46x38.png" alt="Twitter" />
    <h1>This browser is no longer supported.</h1>
    <p>
      Please switch to a supported browser to continue using twitter.com. You can see a list of supported browsers in our Help Center.
    </p>
    <p class="errorButton"><a href="https://help.twitter.com/using-twitter/twitter-supported-browsers">Help Center</a>
    </p>
    <p class="errorFooter">
      <a href="https://twitter.com/tos">Terms of Service</a>
      <a href="https://twitter.com/privacy">Privacy Policy</a>
      <a href="https://support.twitter.com/articles/20170514">Cookie Policy</a>
      <a href="https://legal.twitter.com/imprint">Imprint</a>
      <a href="https://business.twitter.com/en/help/troubleshooting/how-twitter-ads-work.html">Ads info</a>
      © 2021 Twitter, Inc.
    </p>
  </div>
</body>
</html>

Expected behavior A clear and concise description of what you expected to happen.

I would expect that the link checker could actually detect whether the link works. I don't know whether this is out of scope of the commandline tool or not, I guess it's something I would have to do with the library - manually setting up the user-agent to be valid?

Environment:

stevenvachon commented 3 years ago

Use the --user-agent CLI option.

jasikpark commented 3 years ago

thanks! 😅 didn't see that