rails / rails

Ruby on Rails
https://rubyonrails.org
MIT License
55.85k stars 21.61k forks source link

allow_browser may prevent your site from being crawled #50981

Closed nickjj closed 8 months ago

nickjj commented 8 months ago

https://github.com/rails/rails/pull/50505 was merged a few weeks ago which blocks user agents that don't match a specific set of browser versions. Based on the PR it looks like this is the default since it's included in the generator's output.

This runs the risk of preventing your site from being crawled by various search engines. Google has a number of specific crawlers that won't match the agents supplied in that PR (images, videos, etc.).

It will also prevent anyone from accessing your site with curl or other popular HTTP clients unless they manually define an agent that mimics a modern browser. This could be an undesired effect as a default, especially without documentation stating that.

rafaelfranca commented 8 months ago

This is a generated default. It always documents that Only allow modern browsers supporting webp images, web push, badges, import maps, CSS nesting, and CSS :has. If you want web crawlers, or curl access, you can just remove the generated line

nickjj commented 8 months ago

Fair enough, although I think folks will accidentally deny their sites from being partially crawled because they might not think about web crawlers when thinking about browsers. They might think to themselves "oh this sounds great, I wasn't supporting IE 11 anyways" and keep it on because in that limited context it's a no brainer.

For a framework who has a tagline of hello world to IPO, getting search engine traction helps and the current default blocks that.

Maybe a 1 line comment bringing awareness about crawlers being partially blocked would help?

Earlopain commented 6 months ago

This blocks clients by default than don't conform to the format of a popular browsers useragent. That's why useragents look like they do today (this doesn't even mention users who provide distinct and recognizable useragents for their non-browser tools that will now need to impersonate a browser).

Sites can of course choose to block anything and everyone they want to but having this as a default with all the implications that come with it just seems bad. This really should be feature detection instead.

imme5150 commented 3 months ago

I just started a new Rails 7.2 project and was really surprised when Firefox 118 was blocked "Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/118.0" this was released just over 9 months ago!

I think it's really cool that we have granular control over blocking user agents, but it should not be turned on by default. Also, unless you're running something very locked down, I think it's a better practice to block specific user agents that cause issues and not just whitelist a small number of them.

What is the concern here? If it's JS compatibility, then that should be dealt w/ in the JS code. This doesn't seem like a sane default to me.

n-studio commented 2 months ago

I think that the general use case is that the website should be blocked for users who won't properly display css or have functional javascript, but should allow crawlers that don't care about javascript and css.

Instead of whitelisting the allowed browsers, we should blacklist the old browsers we don't want to support anymore by default.

mcary commented 1 month ago

This seems to be a breaking feature in Rails 7.2, a non-major version release. Sure, a developer can remove the generated line easily enough, as @rafaelfranca mentions, but this is adding a manual upgrade step for people who become aware of the need and it risks insidious breakage for people who do not. Could it work to hold the "on by default" part of the feature until Rails 8?

skipkayhil commented 1 month ago

but this is adding a manual upgrade step for people who become aware of the need and it risks insidious breakage for people who do not

Is the line being added to upgrading applications? It should only be present in new applications generated with 7.2

mcary commented 1 month ago

Is the line being added to upgrading applications? It should only be present in new applications generated with 7.2

Ah, you're right, @skipkayhil, it is not being added while upgrading. I'm hitting issues because I'm scripting/documenting the creation of new apps, and that process fails because I can no longer make an automated HTTP request to check that an example scaffold was successfully created.

The upgrade docs say, "Major and Minor versions are allowed to make changes to the public API," so this breakage to my use-case is within documented expectations. And my use case (documentation) is admittedly not exactly the core use-case of versioning policies anyway. (The core use-case is application upgrades, as you suggest.)