sveltejs / kit

web development, streamlined
https://svelte.dev/docs/kit
MIT License
18.72k stars 1.94k forks source link

Separate trailingSlash and prerender build style/characteristic #5334

Closed mprather closed 2 years ago

mprather commented 2 years ago

Describe the problem

My team is building our first production site using sveltekit/svelte. It's a small, relatively simple site that is a perfect candidate for ssr. Today we realized that in order to use our storage provider (azure), we would need to use the trailingSlash=always option. Per the documentation, this will allow us to build the correct page/index.html files required by the underlying system.

The problem is that setting the option to 'always' results in a clunky user experience in the browser.

If the user manually goes to http://fqdn/myPage, the page is rendered and the user sees the same url. However, once they are on the site, if they use any link that takes them to myPage, they are sent to myPage/.

The result is that we are in exact "bad" scenario called out in the documentation where the user (and search engines) sees different urls for the same page.

Describe the proposed solution

I realize we're new to the svelte world and if I missed some configuration setting, please let me know. If not, then I would like to suggest that prerendering build style (page.html vs page/index.html) becomes a distinct option from the trailingSlash property.

It seems we would like to use trailingSlash=never (arguably there are fewer quirks when no slashes are artifically introduced) and have the static adapter use the "page/index.html" build style.

Alternatives considered

We need the page/index.html build artifacts. Otherwise, only the root page is directly fetchable (which is clearly a no-go). If separating trailingSlash from the build style is not a realistic option for sveltekit, then perhaps the underlying index.html files should force a 301 to the slashed version whenever a non-slashed page is rendered?

Importance

would make my life easier

Additional Information

No response

dwsmart commented 2 years ago

If you are pre-rendering to static pages, your web server is handling the serving of page/index.html and page/, sveltekit couldn't do a 301 at that point. It could only do a JavaScript or meta refresh, you can't change a http status code via JavaScript client side.

So perhaps you might be able to handle this with you web server, there's ways to do that in nginx / Apache etc.

But if all else fails, in real world terms, a Meta canonical tag is going to work just fine for the duplication problem here. Search engines are familiar with the pattern, as it's pretty much the default of how webservers traditionally have worked.

mprather commented 2 years ago

Thanks for the reference for the canonical tag. I'm familiar with it. The issue that I'm trying to highlight is that it seems "page build style" is an orthogonal feature to the concept of "trailing slashes", rather than a dependency of.

I find it somewhat ironic that the trailingSlash=ignore option takes special note of the conflict caused by two different urls. Yet, it is possible that even if you don't use the ignore option, you can still run into that very same situation. One possible takeaway from this discussion is that the that the documentation just needs to be updated to lessen the "danger" of using ignore and just point the reader to a possible solution as always and ignore can produce the same undesirable results.

In looking through the list of issues, it does seem that trailing slashes introduces other quirks. At this time, I don't think our project will run into those quirks, but I would prefer to never introduce the opportunity. All we really need is a prerendering property that allows us to specify the page type.

Rich-Harris commented 2 years ago

It sounds like this is really an Azure configuration issue. The fact that your pages are accessible both with and without a trailing slash is a bug; your webserver should be redirecting appropriately. I found this documentation which suggests that "trailingSlash": "auto" Azure configuration would give you the correct behaviour whether your SvelteKit app has "always" or "never", though I'm unclear on whether it's possible to apply that configuration in your case or it it's specific to Azure Static Web Apps (I have never touched Azure so I have no idea).

mprather commented 2 years ago

I wish I had not offered a suggestion for fix because I think that is muddying the argument.

First, I want to clear up that I am not using Static Web Apps. SWA is a slightly different beast that effectively uses some of the same infra as Azure Static Web Sites. The former is the new jazzy thing and the latter has been around for years. I know if you are not familiar with the service offering, then "apps" and "sites" services can be easily confused.

We are using static web sites and the configuration is pretty simplistic - it will render the specific file by name (i.e. mypage.html) or render the file implicitly via mypage/index.html, where "index.html" can be globally changed to a different value. The static web site functionality will look for "index.html" for any requested folder. This is where sveltekit does (or does not) work well with azure static sites.

Here's what we've observed:

My suggestion is focused on recommending a unique property that allows the dev to specify the exact type of built pages. This would be better than the current methodology that cascades one feature into the next. All we really need is the opportunity create <page>/index.html and we're good. We don't need to other features that may come as part of the trailingSlash feature.

For example, it would really be nice to configure sveltekit in the following manner...

  kit: {
    adapter: adapter(),
    prerender: {
      default: true,
      pageBuildStyle: 'index'         // <<--- specify how pages are built, remove dependency on trailingSlash=always to get the right set of pages
    },
    trailingSlash: 'never'
  }

where pageBuildStyle might have two options: 'html' and 'index'

I hope this offers some clarification on what we're seeing and how perhaps two characteristics should be broken into 2 properties.

Rich-Harris commented 2 years ago

Help me understand: if you output <site>/report/index.html, is it only available at <site>/report, or is it also available at <site>/report/? Because if it serves the index.html file in both cases, that's dangerously buggy behaviour, and no amount of SvelteKit config can ameliorate it.

mprather commented 2 years ago

I'm not sure the rendering platform is buggy if in fact it's designed to operate in a simple fashion. Here is the exact behavior I see with static pages rendered from Azure storage blobs (sans svelte anything).

Actual file: /report/index.html

1) Request url: /report/index.html --> Response 200: /report/index.html 2) Request url: /report/ --> Response 200: /report/ (with rewritten content from index.html) 3) Request url: /report --> Response 200: /report (with rewritten content from index.html)

Basic rendering services provided by common web servers are often times coupled with additional modules or advanced oob properties (or even waps, uags, rewriters) to provide enhanced url management. Even without the advanced mechanisms, I can easily get the above characteristics with at least 2 of the most common web servers. Most systems try to default to a 301 redirect from non-slash to a slashed rendering (site/report -> 301 -> site/report/) straight out of the box but it is fairly easy to move away from that configuration.

I just noticed that a change back in Feb (#3801) removed what seems to be the same idea I'm championing here. I also see in the referenced links several systems that have the exact same behavior I've described above - Netlify (pretty urls off), Vercel (default), Render (default), and SWA (default).

As a newcomer to the svelte world, it seems the trailingSlash feature is actually doing 2 things - normalizing urls so that from a client-side perspective the correct links are created and, when it set to always, it forces the underlying files to be built differently. These two behaviors are orthogonal features rather than co-dependency.

Rich-Harris commented 2 years ago

It is buggy - for one thing, /report, /report/ and /report/index.html are different URLs, and that's harmful for SEO (though it can be mitigated with real=canonical). But much worse is the fact that relative links mean different things if there's a trailing slash. (Even if you scrupulously avoid relative links in your own app, SvelteKit itself uses relative links for maximum portability, so if you visit the wrong URL the JS and CSS will simply fail to load.)

It's essential that someone visiting the wrong URL is redirected. Whether we're describing the app or the platform as buggy is a matter of semantics, but the bugginess is there regardless. Is there no way to configure Azure so that it creates those redirects?

mprather commented 2 years ago

I can agree and disagree with the leading statement. I've asked quite a few devs in the past few days about the urls in question, and not a single person felt that /report and /report/ were truly different urls. All felt that it should deliver the same content. Almost all said it might influence SEO and if so, they'd manage it on the perimeter outside the app. The /report/index.html variation definitely raised more eyebrows (i.e. it was a different url) but eventually those persons said they would manage that with advanced settings (if available), on the perimeter, etc.

The fact SEO is brought into the conversation tells me that this team might be optimizing for a scenario that a sveltekit user might not be concerned about. For my use case, SEO is not a consideration at all. In fact, in the last 4 deployments, none of our customers listed SEO as a requirement. I understand that for some SEO is a golden ticket to happiness and are willing to spend time/energy to make some crawler happy.

What is essential is getting the desired behavior the customer wants. They do not want slashes but they need files that follow the report/index.html page generation style in order to have a site that works on their platform.

I spent an hour yesterday looking at variety of static sites generated by a variety of different generators. Some folks clearly like the non-trailing and others like trailing. Then I took the time to look at how they respond to specific requests that are the opposite of what their links used. Some didn't care to redirect from one format to the other, some did. Some only redirected one variation but left others untouched. Some blocked requests for the non-preferred (this was a new twist for me - I don't quite understand why you'd want that config but the undesired urls were either dead ends or a 404). In short, everyone has a preference, and some may choose to handle it with either custom settings or additional hardware/software handlers in front of the server.

Back to the kit... It seems the branching logic that in one case says build report.html and in the other build report/index.html doesn't need to be hard-tied to how client-side urls are created/maintained. It really feels like the kit is so very close to providing the desired output as the logic is already branching based on a particular property. I'm advocating for exposing that property and separating it from the client-side management aspect.

As for your final question: yes, I could add more services to the overall configuration to adjust how content for this new site will be presented to the end users. That is added cost and complexity, which may not be desired.

I appreciate the time in discussing this nuance of the kit configuration. I've certainly learned more about the kit and how it works since I first opened the topic (and I would have probably stated the problem differently). It seems my argument hasn't hit home, and this is probably a won't fix.

Rich-Harris commented 2 years ago

Circling back: yes, I'm afraid this is a wontfix. I appreciate the SEO argument isn't persuasive for everyone, but like I said:

much worse is the fact that relative links mean different things if there's a trailing slash

I'm afraid the developers that argue /report and /report/ aren't truly different URLs are simply mistaken. To human eyes they may seem like the same thing, but to a computer they are not — one will work and one won't (depending on the trailingSlash configuration), and for the one that doesn't the server must respond with a redirect or error response. Delivering 'the same content' but for a different URL is the worst course of action, since relative links will be broken (think missing styles, no JS).