mjackson / unpkg

The CDN for everything on npm
https://unpkg.com
Other
2.97k stars 301 forks source link

Googlebot user agent blocked by Unpkg #314

Closed daniellockyer closed 1 year ago

daniellockyer commented 2 years ago

Hi @mjackson! 👋🏻

We've been seeing reports that requests to our JS library, served by Unpkg, are being blocked when using the Googlebot user agent. Specifically, Cloudflare seems to be blocking it.

I can repro the reports by:

  1. installing User-Agent Switcher on Firefox
  2. switching to the Googlebot user agent
  3. visiting the link - https://unpkg.com/@tryghost/portal@1.12.9/umd/portal.min.js. This seems to happens for all libraries hosted by unpkg (see the React one too).
CleanShot 2022-01-07 at 09 38 01@2x

We believe this is causing error reports from Google Search Console saying pages containing this JS are not fully loading for the scraper and therefore won't be indexed.

Is there a specific reason why the user agent is being blocked, and would it be possible to review the security rules if not?

mjackson commented 2 years ago

Unfortunately this isn't something that you'll be able to reproduce locally because Cloudflare's rulesets check the client's IP address in addition to the user agent, so any request coming from your laptop with the Googlebot user agent string will (rightly) be blocked as fake.

I found an issue on Cloudflare's community forum where a few other users seemed to suggest that rules 100201 and 100201_2 are responsible for blocking legitimate requests from Googlebot. I'll try disabling them and see if it helps.

daniellockyer commented 2 years ago

@mjackson Ah I didn't know that! Thanks for the info and looking into the issue 🙂

daniellockyer commented 2 years ago

Hey @mjackson - it still seems like this is an issue 😕 We've had another report that the JS is still blocked and causing issues with Google Search Console

daniellockyer commented 1 year ago

Closing as this no longer affects us 🙂