rafgraph / spa-github-pages

Host single page apps with GitHub Pages
https://spa-github-pages.rafgraph.dev
MIT License
3.83k stars 565 forks source link

Suggestion - Avoid 404's using meta tag #67

Open fidian opened 8 months ago

fidian commented 8 months ago

Smashing Magazine wanted to solve the same problem as this repository and I think their solution is fairly elegant as well. It involves a small 404 page.

<script>
  sessionStorage.redirect = location.href;
</script>
<meta http-equiv="refresh" content="0;URL='/REPO_NAME_HERE'">

Because there's the "refresh" header, this gets translated into a 301 response within the browser (ignoring the 404 status code from the server). To use this, the SPA needs a bit of JavaScript at the very beginning to load the route from session storage and use replaceState to update the URL.

<script>
  (function(){
    var redirect = sessionStorage.redirect;
    delete sessionStorage.redirect;
    if (redirect && redirect != location.href) {
      history.replaceState(null, null, redirect);
    }
  })();
</script>

According to my quick searching, Google doesn't penalize 301 redirects nor the refresh header redirects, so this might be a viable alternative without putting the path after a hash.

tonisives commented 8 months ago

Tried it and it seems to work the same way. But has a lot less code and possibly the mentioned Google benefit. So seems like a better solution.

Edit: I moved to s3 hosting with cloudfront, and what seemed to work in the end was this

It still does redirects with #!, but google search sets these pages as valid in the end. But it takes weeks for google to catch pages up and I think it still sets some as redirect error. But some are indexed

njt1982 commented 2 months ago

Also worth referencing https://developers.google.com/search/docs/crawling-indexing/301-redirects#metarefresh

If server-side redirects aren't possible to implement on your platform, meta refresh redirects may be a viable alternative. Google differentiates between two kinds of meta refresh redirects:

  • Instant meta refresh redirect: Triggers as soon as the page is loaded in a browser. Google Search interprets instant meta refresh redirects as permanent redirects.
  • Delayed meta refresh redirect: Triggers only after an arbitrary number of seconds set by the site owner. Google Search interprets delayed meta refresh redirects as temporary redirects.

I'm trying this approach on https://njt1982.github.io/minecraft-item-browser/copper_block as Google didn't like it when I simply served the page using 404.html (basically copied index.html to 404.html during gh-pages deploy).

Hoping this redirect approach makes it happier... Although I'm concerned that it will see all sub-pages as a 301 to the root of the repo. I'd really like each sub page to actually be its own URL. Might need to combine this approach with https://github.com/rafgraph/spa-github-pages... 🤔


EDIT: Although it looks like Google might respect History API, after all... https://developers.google.com/search/docs/crawling-indexing/javascript/javascript-seo-basics#use-history-api