iterative / dvc.org

📖 DVC website and documentation
https://dvc.org
Apache License 2.0
337 stars 393 forks source link

server: better way to handle HTTP redirects #757

Closed shcheklein closed 4 years ago

shcheklein commented 5 years ago

Every time we move a page to a new URL we need to create a proper redirect.

The only way to this now is to create a 4-5 lines entry in our custom server.js. This is def. not scalable. We need to find an easier way to handle this.

The first solution that comes to my mind is a separate file with a list of simple expressions, may be converter functions.

jorgeorpinel commented 5 years ago

separate file with a list of simple expressions...

And maybe add a note with the date the redirect was added (can also be checked with GitHub Blame but it's not always straightforward) for periodic cleaning up of this list.

Another option is to use DNS for this, not the app.

Suor commented 5 years ago

@jorgeorpinel I don't think those should be cleaned up unless they clash with something.

Another thing - do not forget to use HTTP code 301 for moved things, not 302.

jorgeorpinel commented 5 years ago

Actually @Suor we are having some trouble with some 301 "permanent" redirects that we now want to change again... They're cached by the browser and can't be changed again by us until that cache expires a long time later. 302 Found may be better for us since we are not so sure they are so permanent. See https://github.com/iterative/dvc.org/issues/768#issuecomment-549456704

shcheklein commented 5 years ago

I think they are still 301s @jorgeorpinel . Even though we change the end location we don't change the fact of the redirect. So, unless we don't find a good solution for caching, I would keep it 301. But I haven't googled best practices around dealing with cache yet.

shcheklein commented 5 years ago

Ok, reading the link now in that comment :)

shcheklein commented 5 years ago

If no-cache policy does not work probably we should indeed use 302s, but I would google about the consequences of this for search engines and whatnot.

jorgeorpinel commented 5 years ago

Well, they both have an impact. 301 is better in general but we need to be sure the redirect is actually permanent and since we're in an active development phase... I think we should not use 301s as much. Many of our redirects are temporary, which is what 302 is for. As for SEO effects:

302 redirect informs search engines to continue indexing old pages. It may impact importantly your SEO results. When you are optimizing the new page URL, make certain to be vigilant about not having duplicate content on both old and new pages at a similar time. The temporary redirect is very useful when you bring your website visitors back to the old page after some time.

https://www.firstrankseoservices.com/blog/effects-of-301-and-302-redirect-on-seo/

shcheklein commented 5 years ago

Almost all our redirects are 301s in nature. We do not plan to bring your website visitors back to the old page after some time.. The only thing that is potentially not permanent is the URL we redirect to (not the fact of redirect). And probably the right way to handle this actually is to keep old 301s and add new ones (making a chain of 301s effectively).

Either we do that, or we can try no-cache as a hack.

Looks like 302s is not a good option in terms of SEO.

jorgeorpinel commented 5 years ago

p.s we don't HAVE to use 302 Found, there's also 303 See Other, and 307 Temporary Redirect.

https://en.wikipedia.org/wiki/List_of_HTTP_status_codes#3xx_Redirection

jorgeorpinel commented 5 years ago

Full list of current redirects in server.js (from https://github.com/iterative/dvc.org/issues/768#issuecomment-549494662) and their motivation:

jorgeorpinel commented 5 years ago

p.p.s. also there's a new 308 Permanent Redirect code in HTTP 1.1 to replace 301, and is now supposedly supported by most major browsers that have significant market share. https://tools.ietf.org/html/rfc7538#page-3

shcheklein commented 5 years ago

I would say we can use 308 for "structural" redirects - man.dvc.org to dvc.org/man - pages that were never indexed in the first place.

For real pages (like tutorial or something) we should probably keep using 301 to be safe, to properly transfer ranking from one page to another. And actually never change the redirect itself, introduce a new one if needed.

Again, this is from the top of my head - I haven't had time to google and confirm all these points. Point is we should think about SEO, about caching, etc carefully.

jorgeorpinel commented 4 years ago

Another option is to use DNS for this, not the app.

We never discussed this option. Wouldn't it that be easier for several of the redirects (listed in https://github.com/iterative/dvc.org/issues/757#issuecomment-549611289 above).

shcheklein commented 4 years ago

@jorgeorpinel what DNS options do you specifically mean?

jorgeorpinel commented 4 years ago

Hmm I guess DNS records won't help us much, now that I think about it. There are some DNS-based/related services e.g. http://redirect.center/ and http://redirect.name/ which could help us implement some basic redirects:

But maybe not the ones based on app routes (URL paths). So maybe this is not worth it, just for 2 cases. Plus we may not want to rely on some external service.

Basically... Never mind about DNS options.

jorgeorpinel commented 4 years ago

Another option to actual HTTP redirects is conditional rendering logic in the server side components combined with client-side shallow routing as explained in https://sergiodxa.com/articles/redirects-in-next-the-good-way/ We'd need to see how difficult the refactoring is, and make sure the code doesn't result confusing, but this has the advantage that no HTTP redirects would ever be issued. Maybe we would even stop needing a custom server script (which disables automatic static optimization).

shcheklein commented 4 years ago

HTTP redirects would ever be issued

I think this is not the goal. we should issue 301 for the proper search engines handling and when you do use tools like curl to fetch the page. Client-side rendering alone is not enough.

jorgeorpinel commented 4 years ago

OK, well, as Next.js doesn't yet feature HTTP redirects the only way to do it is with a custom server script like we already have (/server.js) (which disables automatic static optimization unfortunately).

What I think we can do to improve this is to extract all the redirect logic to a separate module that loads the regex and targets from a JSON in order, and tests them all. That way we only ever change the JSON file in the future, which is kind of configuration instead of code (it could even live outside the Git repo).

shcheklein commented 4 years ago

extract all the redirect logic to a separate module that loads the regex and targets from a JSON in order, and tests them all

yes! that was the idea of this ticket

jorgeorpinel commented 4 years ago

OK got it, great. Actually though, yet another option I'd like to leave here for the record is to write and deploy a completely separate proxy server that exclusively does redirects, and that fetches our regular node app when there's no special redirect.

I'd still vote for the separate module at this point.

jorgeorpinel commented 4 years ago

p.s we don't HAVE to use 302 Found, there's also 303 See Other, and 307 Temporary Redirect.

On this, I vote for 303 See Other as discussed in https://github.com/iterative/dvc.org/pull/875#pullrequestreview-341328648.

shcheklein commented 4 years ago

yep. 303 should be fine, not ideal as well but fine.