isaacs / github

Just a place to track issues and feature requests that I have for github
2.21k stars 128 forks source link

Add single page application support for Github pages #408

Open zakhenry opened 9 years ago

zakhenry commented 9 years ago

As github pages does not support server side configuration (for example .htaccess files), it is impossible to get url rewriting to the index page working for a single page application.

Ideally, there should be a .gh-pages.yml or similar file in the repo root that has certain flags like

redirects:
    redirect_to: index.html
    try_files: true

The above would be the equivalent of the following for nginx

location / {
    root /path/to/site;
    index index.html;
    try_files $uri $uri/ /index.html =404;
}

The workaround is to use a hashbang like in angularjs html5 mode but this is fairly unsightly.

I think this feature is much wanted in the angular js community, and any other frontend javascript framework that supports the HTML5 History API.

stuartpb commented 9 years ago

There's also the concept of 200.html files, which work like 404.html files in that they are served in response to any unmatched URL, but with an HTTP 200 OK response instead of an HTTP 404 Not Found. (This is an existing pattern supported by other CDNs geared toward SPAs, specifically surge.sh - something I debated with @sintaxi about on Twitter a while back).

To be clear, I think app authors should be using redirections along the lines of what @xiphiaz is describing (and some kind of route configuration file would be great for defining this, especially if a shared standard could be used across GitHub Pages and surge.sh), for the same reasons I described in the linked Twitter discussion (it matches the REST semantics the web has been designed for) - I just think these SPA platforms should work toward some kind of write-once-deploy-anywhere convergence (akin to the way the container backend world is converging on a single container standard right now).

stuartpb commented 9 years ago

Also, something that would be fantastic: a way to specify a rule stating requests on catching routes should be directed to a page with whatever path was requested as part of the hash-fragment like a /#/, /#!, or /#!/, akin to old-new-Twitter's routing style.

stuartpb commented 9 years ago

One thing I just realized (and mentioned on Twitter) is that what I'd really like for my static-resource-based design is routing to multiple pages, which could be handled nicely by making 200.html, rather than only working with one global file, to work for any request under that path (as opposed to index.html, which only works for the containing path exactly).

I'd still want some mechanism to disable this in the aforementioned YAML config file (it's entirely possible I have a design that just has a page called "200.html" that isn't meant to have special behavior), but this would be a nice default for laying out app architectures for sites that

Examples of sites that would reasonably have a 200.html that isn't meant to be served for all requests:

stuartpb commented 9 years ago

Building on that concept of having files for routes, I think it might also be useful to have some kind of file or sentinel value for setting up 301/302 redirects, possibly using a similar one-line value like the root CNAME file. One common pattern I could see for this is using this to specify "redirect anything below this path to this path, with the rest of the path placed behind a slash.

I think the most sensible format for this value would be one space-separated line which works like CloudFlare's Page Rules (except evaluated path-relative to the file's location), where wildcards can be specified with * and referenced with $1 &c.

As this would allow for developing single-page URL schemes that don't violate REST semantics, this would likely be the form I would use to structure my own apps, using a pattern like * #/$1 to have requests redirect and use the directory's index.html.

stuartpb commented 9 years ago

Also useful: an app-wide option for setting a redirect rule to direct requests for */index.html to canonical $1/, the way requests for directory names without a following / are. Also, an option for specifying whether canonical paths for directories (acting as extension-less "files" with their index.html) should or should not have a / suffixed.

Though shorthands for these rules in the options would be nice, just keeping it simple with a redirect array in the YAML config containing a series of strings (or objects with from and to fields) matching the capture and replacement formats described above for 301/302 files would be enough for me (as I'm pretty sure this would permit translation into Apache and Nginx configs' redirect rules, and likely rules for other file servers a CDN may choose to use).

stuartpb commented 9 years ago

Here's my overall thoughts on what a Static Access Rules Spec would look like (I'm going to go with "MUST" for all the rules below, since I think that's the model used by specs like the HTML Living Standard, where browsers are allowed to be divergent from the spec at the cost of officially being non-compliant with the relevant section).

Unlike the HTML Living Spec (which addresses a many-headed beast that is far from any kind of "settled" functionality or uniform surface that can be targeted at once), I'm thinking this spec would be versioned (more like the DOM's "Levels"), with endpoints having compliance with specific versions of the spec (mod their own non-compliant shortcomings or extensions, which may be tracked by some kind of caniuse-type table).

Level 0

Note that AFAIK no currently-shipping server supports all of Level 0 out-of-the-box (specifically, I'm pretty sure no server has directory listings off by default, nor do any redirect from index.html), but all the file servers I know do have configuration rules that can be applied to support Level 0 without any additional code to parse / translate access.yaml (meaning that if you upload an app with no access.yaml to a service that supports Level 0, you can be safe in knowing none of these undesired behaviors will be applied).

Probably out-of-spec for non-orthogonality

So, CNAME has a number of issues here that not only keep me from wanting to specify it as Level 0, but also keep me from including it in Level 1, or even putting it in the spec at all:

Level 1

Level 2 (maybe)

The 200.html and 30X-redirect file behaviors are more complicated than translating access.yaml rules into the server's config format (they have O(N) access complexity, where N is every directory in the project), so they might actually be moved to Level 2 (or maybe only for the non-global case). (They wouldn't go in some kind of Level-1-alternative because it would be pathological to force developers to support two different conflicting definition formats: they should have the option to use the files-in-tree format if their target endpoint/CDN supports it.)

On why I'm calling it access.yaml

I'm calling it access.yaml because this mirrors the name of Apache's .htaccess. I'm not calling it htaccess.yaml because I consider htaccess to be a very specific keyword describing Apache's configuration model/format, which this structure does not attempt to copy.

I'm not calling it something like config.yaml or options.yaml because there are many things in the lifecycle of an app that could entail a definition of options or configuration (especially on the tooling front), which have their own YAML structures that access rules shouldn't have to dance around to avoid key/model collisions. (See also: the Bower/Component battle over component.json that landed with Bower moving to bower.json).

I'm not dotting the file, because this is a core component of a SPA's structural definition, on the level of a Makefile - it should not be hidden to developers who wish to make changes to the app. (In fact, I nearly considered calling it Access.yaml.)

I'm using the yaml extension and not the yml extension because, while I appreciate that files like .travis.yml set a precedent for only supporting the yml extension, the three-character file extension format is an obsolete notion that died with MS-DOS. If I were to support three-letter extensions, I would start with ".htm" - which most endpoints aren't currently supporting, in favor of having one file at the .html extension.

captn3m0 commented 9 years ago

Just for reference, this is similar to the approach taken by Netlify, which offers both redirects and custom header support for static sites. Their redirect configuration is kept in _redirects file, which looks like:

/home              /
/blog/my-post.php  /blog/my-post
/news              /blog
/google            https://www.google.com
/home         /              301
/my-redirect  /              302
/pass-through /index.html    200
/ecommerce    /store-closed  404
/news/:year/:month:/:date/:slug  /blog/:year/:month/:date/:story_id
/news/*  /blog/:splat
/story id=:id  /blog/:id  301
/*    /index.html   200
/api/*  https://api.example.com/:splat  200

The last 2 are of note, because they allow:

  1. Single Page Applications by shadowing
  2. Proxying to third-party domains, allowing lots of APIs to work out of the box

The _headers file looks like:

/*
  X-Frame-Options: DENY
  X-XSS-Protection: 1; mode=block
/something/*
  Basic-Auth: someuser:somepassword anotheruser:anotherpassword

So, its just paths followed by the custom headers. They also have support for other fancy features such as pre-rendering for SEO, and webhooks.

Disclaimer: Have nothing to do with Netlify. And I wasted too much time looking for it in my HN comment history. HN Discussion

stuartpb commented 9 years ago

I'm going to bed now, but I'll take a look at those tomorrow- my first reaction is that, while that looks neat and I do like the way it combines multiple mechanisms (rewriting, redirecting, custom-erroring, and forbidding/hiding) into one status-oriented syntax, that syntax also looks a little over-engineered (eg. the route params - I'm not sure if :slug turning into :story_id is a typo, but it'd be both simpler to implement and harder to mess up if done as $1/$2/$3/$4) and over-loaded (the query string being a separate space-separated field complicates arity checking when it could just be part of the match field) - not to mention I don't see how _headers would work in a scenario requiring different custom headers for different pages.

If I were to follow this pattern, I'd probably call it accesstab or similar, after the tabular formats it resembles like fstab and crontab.

stuartpb commented 9 years ago

Okay, so I've had all this churning in the back of my mind as I went about my day (I've also had a great resurgence in appreciation for static-based app layouts, having this reliability as a baseline guarantee). Here are some notes from my current concept:

Open questions:

Anyway, this is rapidly getting too complex to continue to draft in this issue, so I'm going to continue this work in a repo (with probably the occasional pingback for progress to this issue) at https://github.com/stuartpb/s4-specs.

stuartpb commented 9 years ago

Just remembered: one irking thing about the three-column model is that it doesn't allow a simple definition for "return the standard 403 or 404 page", so this might be cause for a "two-column rules use the default path" extension to the spec behavior. Again, I need to go to bed.

sintaxi commented 9 years ago

@stuartpb This is a really great thread. FWIW - surge.sh already supports most things you have mentioned. Here is a list...

Implicit Redirects & Clean URLs

Custom redirects using a ROUTER file

This is fairly self explanatory. It will redirect if it matches a pattern in the ROUTER file.

301     /:yr/:mo/:dy/:slug       /articles/:slug
301     /blog?title=:slug        /articles/:slug
302     /blog                    http://medium.com/sintaxi

Custom domain using CNAME file

The following CNAME file will server both http and https...

example.com

The following CNAME file will only server over https and it will force http traffic to https

https://example.com

The following does the opposite

http://example.com

Auto redirect www subdomains.

BasicAuth using AUTH file.

With an AUTH file you provide a list of user/passwords to protect the site.

thurston@sy.com:teenagedcomputer
kim@sy.com:goo

Not necessarily perfect design but these things all work today and they are fairly pragmatic features driven out of real use cases and most API choices have been vetted by users of harp or surge.

Personally I think the days of using config files for such things should be a thing of the past. 99% of the time we all want he same things anyway. Most of these features can be triggered by conventions. (eg. 200.html files).

stuartpb commented 9 years ago

@sintaxi Interesting. Does ROUTER support creating recursive / location-local rules (as would be possible for defining patterns like path-wide "/200.html" files as described above / on Twitter)?

Depending on what conventions and patterns I find with the other platforms, I might just go for standardizing on surge.sh's conventions and names. I'm still not wild about named route parameters, but if they're the only major difference between the existing ROUTER schema and what's strictly necessary I'll just standardize them (using the rules of https://github.com/pillarjs/path-to-regexp).

Also, does your CNAME file work in the multiple-entry list case? If no, what was the rationale around not doing so?

stuartpb commented 9 years ago

Also, re: defining HTTPS behavior in CNAME, I'm personally not crazy about the notion of explicitly downgrading HTTPS, as even the most justified cases for it (low-power hardware where HTTPS causes a significant dip in performance / spike in consumption, the ability to inspect wire traffic) aren't really justified when you consider the potential downsides (there's a reason browsers are only going to support HTTP/2 over TLS). Opting in to HTTPS upgrades, maybe, but I have my misgivings with the notion of forcing HTTP-only traffic at a definition level.

While I'm generally not the type of person to enforce a prescriptive best practice by gimping the things that are possible as part of a spec (which should aim almost exclusively to support any in-scope behaviors that were possible in its precedents), warelessly pasting in / writing "http://" at the beginning of your CNAME (or doing so because you've seen it done once, where it was intentional, and then copying the convention under the impression it's required) and having HTTPS be inadvertently stripped from incoming connections seems like too likely of a mistake to make. (In other words, the ease of triggering the behavior seems disproportionate to the extremely low desirability of that behavior).

stuartpb commented 9 years ago

Vis. CNAMEs supporting multiple hostnames: I personally don't think it's good practice (the whole point is to have a canonical name, after all) and would never personally use it, but the alternative would likely be hackier for somebody who wants to do such a behavior (without multiple host support in CNAME, they'd have to publish twelve versions of their codebase that only vary the CNAME file to re-use their files on twelve hosts), and considering that traditionally-not-necessarily static platforms like Heroku have natural support for multiple domains in this fashion, I feel it should be supported.

stuartpb commented 9 years ago

Note that CNAME is for hosting files under multiple names. Implementations MUST NOT redirect names specified in the CNAME file (implementations MAY redirect to names that are in the CNAME file - nowww/yeswww Normalization, following the logic @sintaxi described, will likely be a component of the CNAME spec). Specifying redirections at the host level is a separate concern that may be handled by separate systems and will be separate from the CNAME spec.

stuartpb commented 9 years ago

Also to be in the spec: a note explaining that you have to use a route in the ROUTER file if you want your site to respond to requests for /ROUTER with something other than your list of ROUTER definitions, because ROUTER is, by default, subject to the same file-serving rules as any other file in the repo. (The hairiness of defining rules for each special file is making me kind of want to put all of these in a subdirectory, not to mention the way it'd reduce the pollution of the root namespace that has been so historically bad.)

stuartpb commented 9 years ago

Also to note: while ALL of the codebase's files, by default, MUST be served normally (including those files that act as definitions), compliant compilers/generators/config-transpilers (both in implementations and in tooling) MUST take actions in their compiled files, if these files are not then re-introduced to the codebase, to respond to requests for those files as if those files did not exist (so eg. if pushing to your platform creates a customer_compiled-nginx.conf, that conf file needs to include a rule that either hides itself or responds with a certain page as a 200 OK, depending on what behavior the developer defined in their ROUTER for that path without the existence of the file).

Tooling that adds compiled files to a codebase (such as rendered template HTML files) MAY add rules in its generated ROUTER to hide any files (such as source templates), so long as these rules remain present and visible as part of the codebase after compilation.

TODO: define "the codebase".

stuartpb commented 9 years ago

Actually, I'm not sure about having the definition files exposed by default. Assuming there's a simple route to show/hide them (which I'm pretty set that there should be), both sides have the argument that it's easy enough to specify a rule in ROUTER to switch to the other.

On the one hand, magic rules are a pain, and this makes introspection/openness a common part of the platform (the way so much of the web already is). It's irritating to have to pick off special-case rules defined by the spec just to make a server behave more predictably.

On the other hand, exposing stuff like your routing rules, while you may or may not be exposing the files you may be hiding (and maybe you're only hiding certain files in one fashion and hadn't considered another way to access them, which is revealed by your ROUTER spec!), is a potential security hole, and "this is insecure by default, do this extra thing to secure it" is really a pretty bad scene. Also, I'm pretty sure there's precedent ie. in Apache hiding .htaccess - also there's a parallel to HTML here: <head> is only display: none as part of the UA stylesheet, and you're actually allowed to make it visible if that makes any sense to your model.

(TODO: define interaction between /../ and routing)

So, actually, I think I'm going to walk back the assertion that definition files must me served as 200 by default, and instead say they must be served as 404 by default. End users may then wish to expose them with a 200 /ROUTER rule.

Also the name I'm kicking around for a directory containing these files is _htspec, since that both makes the name sort between Uppercase and lowercase in ASCIIbetical sorting, and hides the directory naturally in Jekyll-based systems (which should still expose other _-prefixed locations in standards mode, for the "picking-off-special-rules-is-irritating" reason explained above).

stuartpb commented 9 years ago

It just occured to me the importance of adding the rules "Implementations in strict mode MUST NOT execute or interpret files outside this specification" and "Implementations MUST NOT write files to the code base in response to requests", basically as a way of officially saying "implementations MUST NOT turn into PHP".

yairEO commented 8 years ago

Any status about this? I find this a very hot feature, can't wait for it, I have demo pages which are broken because of lack of routing.

Thanks!

lapidus commented 8 years ago

+1

kasperpeulen commented 8 years ago

Is there something to get routing with # already working? For example I have an angular2 application: http://ng2-dart-samples.github.io/router/

But if I go directly to the route: http://ng2-dart-samples.github.io/router/home

I get a 404

lpil commented 8 years ago

:+1:

jasonniebauer commented 8 years ago

+1

SteveALee commented 8 years ago

+1

alanbeech commented 8 years ago

+1

colorgap commented 8 years ago

Guys I found a workaround for this issue, I created an angular2 app for the documentation of our open source app framework system and found this issue that upon refresh of the page getting 404. If I go to below link it works as i setup base href to /projectname/ http://colorgap.github.io/brush

and then app works perfectly fine. but if I refresh my page on About page then it fails and give 404 page. http://colorgap.github.io/brush/about

I found out that github page allow you to customize 404 pages. so i thought if I create a copy of my index.html and rename it to 404.html then it should fix the issue and it fixed it. Now I have perfectly working structure on github pages. I know this is not a perfect solution or desired but i guess it does the trick for now.

Really looking forward for some better solution. But for now create a copy of index.html as 404.html

csuwildcat commented 8 years ago

Hey folks, I came up with a hack that enables all sorts of route handling without taking the 404 SEO hit. It uses the 404.html page to hijack the inbound attempts and redirect them back at the index.html page with the correct path intact (via the History API): http://www.backalleycoder.com/2016/05/13/sghpa-the-single-page-app-hack-for-github-pages/

rafgraph commented 8 years ago

I’ve been working on a way to host single page apps with GitHub Pages for the past few of weeks. Check out the repo and example site. Repo: https://github.com/rafrex/spa-github-pages Example site: http://spa-github-pages.rafrex.com

I used a custom 404.html page with a redirect script that takes the current url and converts the path and query string into just a query string, and then redirects the browser to the new url with only a query string and hash fragment. For example, example.tld/one/two?a=b&c=d#qwe, becomes example.tld/?redirect=true&pathname=%2Fone%2Ftwo&query=a=b%26c=d#qwe. The GitHub Pages server receives the new request, e.g. example.tld?redirect=true..., ignores the query string and hash fragment and returns the index.html file, which has a script that checks for a redirect in the query string before the single page app is loaded and converts it back into the correct url.

Note: I was originally working on a React only solution, here, when @csuwildcat gave me the idea to use history.replaceState(...) instead of a react-router onEnter hook for the final stage of the query redirect.

SamGerber-zz commented 8 years ago

@csuwildcat & @rafrex: Thanks for coming up with workarounds for this issue. Maybe great minds just think alike? I know @rafrex has been working on his workaround for several days now, so maybe this is just another case of simultaneous invention?

rafgraph commented 8 years ago

@csuwildcat, I didn’t see your tweet, but I did see your post, nice work. You did things a bit differently with a meta http-equiv refresh I think (instead of a query string with a window.location.replace). The original repo I was working on was a React only implementation, and you gave me the idea to use the history.replaceState(...) instead of a react-router onEnter hook for the final stage of the query redirect. Thanks. If you look at the history of the react-github-pages open sourced repo, I had the redirect working way back on may 3rd: https://github.com/rafrex/react-github-pages/tree/bbe0efa0212261297d9f6e0cfd76023695681f4c (the example site is pretty bad and documentation is non-existent, but the redirect code is all there, and it all works). I hope this clears things up.

csuwildcat commented 8 years ago

It's cool, nice job making a template, I wanted to but haven't had the time. Hopefully people get value out of all this.

rafgraph commented 8 years ago

Cool, thanks. I think people will get value out of this, I know I certainly will.

SteveALee commented 8 years ago

Some good ideas here, thanks. I've been using a 404 file with client redirect for a while now but as pointed out it only solves some of the issues. Using the preprocessing is certainly worth trying however it does feel wrong to be using client behaviour it work around server failings. Especially for static sites which arguably should not require javascript (obviously not SPAs).

A 200 page like surge provide would be much better.

Incidentally, a custom 404 works with my organisation project page.

The fact that project pages are not at the domain root causes other problems too. Not just for routers which assume this.

If you use custom domains as a solution for the non root problem you then have to provide your own HTTPS certificate, though that is getting easier. Newer web features like progressive web app manifests require Https.

we need github to provide a solution that always uses the domain root and provide proper server-side reditect for unknown pages (eg 200 page)

rafgraph commented 8 years ago

I'm not a fan of the 200 page as a clean solution because you have to duplicate your index.html file in your 200.html file, that still feels hack, although less so (surge's 200 page is what gave me the idea to serve gh pages out of the 404.html file awhile ago).

My suggestion would be to have a .spa file in the repo which would turn on single page app compatibility (similar to how the .nojekyll file is implemented), and then GitHub's servers would ignore the path and always serve up index.html, and let the single page app handle undefined paths.

sintaxi commented 8 years ago

you have to duplicate your index.html file in your 200.html file

FWIW - with surge you can only have a 200.html if you like. No need to duplicate content in a index.html file.

My suggestion would be to have a .spa file in the repo which would turn on single page app compatibility.

With a .spa and a index.html file you now have two files in your project for something that could be accomplished with just one 200.html. Besides, why not match the 404.html paradigm? I tend to think there are too many dot files in projects these days.

Lastly, a 200.html (or .spa) should only respond to certain mime types. For example, /some-image.png should still 404 if it is missing.

rafgraph commented 8 years ago

I definitely agree that some mime types should still return 404 if missing.

Personal preference, but I don't think we should give up on the index.html paradigm. I'd rather have a settings file than not have an index.html file. Although really, all I want is native single page app support from GitHub Pages, however they decide to implement it.

rafgraph commented 8 years ago

@sintaxi, just saw you created surge, nice work. One suggestion, implement git push deployment (like github pages and heroku).

sintaxi commented 8 years ago

@rafrex Thanks a lot.

This thread should probably stay about hosting single page apps so I wont git into it but there are reasons were not using git. You can find us in surge chat if you to talk more on that.

spacedevin commented 8 years ago

The redirect solutions work, however they still generate a 404 header.

I decided to just make symlinks instead.

  1. create a .nojekyll file
  2. create a symlink from index.html to 404.html (ln -s index.html 404.html)
  3. If you know all of your routes, and want 200 status codes, create symlinks for each route back to index. So if you had a route like /about, you would create a symlink called about.html since it automatically finds that file. (ln -s index.html about.html)

You can see it working here http://arzynik.github.io/gh-pages-spa/

Still would be great to have a better solution.

stuartpb commented 8 years ago

I think this discussion is just the booster shot I needed to get back to work on designing a standard for all of this - I'd been stuck on a name for it (one of the two hard things in computer science), calling it "s4-specs" and wondering if I could maybe shove in some more S-words (like "simple" or "single").

The new name I've settled on builds off of @rafrex's instinctual suggestion for a .spa sentinel file. Including spa in the name seems like a reasonable way of hinting "this contains single-page application" to a dev encountering the file for the first time. To give it a non-ambiguous, less-likely-to-collide name (one with some Google juice), I'm going with _spaspec: https://github.com/spaspec/spaspec-standard

allenhwkim commented 8 years ago

+1

paramaggarwal commented 8 years ago

So currently there are 3 options for deploying single page apps:

  1. Github Pages
  2. Surge
  3. PubStorm

Now I read about all of these and was comparing their free plans. As per my understanding, Github Pages does not have these two things:

  1. Clean URLs
  2. Client side routing

But from @arzynik's excellent suggestion above, using symlinks we can have both of these capabilities.

rafgraph commented 8 years ago

Correct me if I'm wrong, but the problem with symlinks is that they don't support dynamic routes, for example, /todos/:id.

stuartpb commented 8 years ago

That's one problem (there's also no guarantee that your deployment method will recognize symlinks) - another problem is that they don't have any way to control any other kind of HTTP status with a response, such as a redirect, or a 403 Forbidden.

_spacspec/routes, on the other hand, allows for all of this, and more.

rognoni commented 7 years ago

For information, this is the way used in firebase.json:

{
  "hosting": {
    "public": "public",
    "rewrites": [
      {
        "source": "**",
        "destination": "/index.html"
      }
    ]
  }
}
jimthedev commented 7 years ago

@rafrex @paramaggarwal

Correct me if I'm wrong, but the problem with symlinks is that they don't support dynamic routes, for example, /todos/:id.

You are correct. Just confirmed it by modifying @arzynik's example to accept route params. It does not work properly, meanwhile going to the same with hash location strategy works perfectly. Symlink is a decent solution if you have super simple routing, but that's about it.

NullVoxPopuli commented 7 years ago

Has anything happened with this?

GabrielDelepine commented 7 years ago

The repo spa-github-pages gives me the more appropriate answer.

Github redirect every missing files to the file 404.html if it exits. With a JavaScript redirection, it's possible to have a real spa hosted in the github pages.

jimthedev commented 7 years ago

@GabrielDelepine If I recall using 404.html will work as you described but unfortunately it still has an http status code of 404 even though the content is under your control. So while you can get it work, you aren't truly able to override the 404 http status code. Not sure how this would work with SEO since a spider might be fooled into thinking the page was not found and thus not worthy of serving / scraping. YMMV but ideally these routes would be 200's.