Add single page application support for Github pages

zakhenry commented 9 years ago

As github pages does not support server side configuration (for example .htaccess files), it is impossible to get url rewriting to the index page working for a single page application.

Ideally, there should be a .gh-pages.yml or similar file in the repo root that has certain flags like

redirects:
    redirect_to: index.html
    try_files: true

The above would be the equivalent of the following for nginx

location / {
    root /path/to/site;
    index index.html;
    try_files $uri $uri/ /index.html =404;
}

The workaround is to use a hashbang like in angularjs html5 mode but this is fairly unsightly.

I think this feature is much wanted in the angular js community, and any other frontend javascript framework that supports the HTML5 History API.

stuartpb commented 9 years ago

There's also the concept of 200.html files, which work like 404.html files in that they are served in response to any unmatched URL, but with an HTTP 200 OK response instead of an HTTP 404 Not Found. (This is an existing pattern supported by other CDNs geared toward SPAs, specifically surge.sh - something I debated with @sintaxi about on Twitter a while back).

To be clear, I think app authors should be using redirections along the lines of what @xiphiaz is describing (and some kind of route configuration file would be great for defining this, especially if a shared standard could be used across GitHub Pages and surge.sh), for the same reasons I described in the linked Twitter discussion (it matches the REST semantics the web has been designed for) - I just think these SPA platforms should work toward some kind of write-once-deploy-anywhere convergence (akin to the way the container backend world is converging on a single container standard right now).

stuartpb commented 9 years ago

Also, something that would be fantastic: a way to specify a rule stating requests on catching routes should be directed to a page with whatever path was requested as part of the hash-fragment like a /#/, /#!, or /#!/, akin to old-new-Twitter's routing style.

stuartpb commented 9 years ago

One thing I just realized (and mentioned on Twitter) is that what I'd really like for my static-resource-based design is routing to multiple pages, which could be handled nicely by making 200.html, rather than only working with one global file, to work for any request under that path (as opposed to index.html, which only works for the containing path exactly).

I'd still want some mechanism to disable this in the aforementioned YAML config file (it's entirely possible I have a design that just has a page called "200.html" that isn't meant to have special behavior), but this would be a nice default for laying out app architectures for sites that

Examples of sites that would reasonably have a 200.html that isn't meant to be served for all requests:

Sites that document at least 200 numbered things, such as an online Pokédex, using the numbers to refer to the things (this would be sensible for a Pokédex, as Pokémon names vary from locale to locale).
A site that provides documentation of what different status codes mean (or even just a dictionary of technical codes).

stuartpb commented 9 years ago

Building on that concept of having files for routes, I think it might also be useful to have some kind of file or sentinel value for setting up 301/302 redirects, possibly using a similar one-line value like the root CNAME file. One common pattern I could see for this is using this to specify "redirect anything below this path to this path, with the rest of the path placed behind a slash.

I think the most sensible format for this value would be one space-separated line which works like CloudFlare's Page Rules (except evaluated path-relative to the file's location), where wildcards can be specified with * and referenced with $1 &c.

As this would allow for developing single-page URL schemes that don't violate REST semantics, this would likely be the form I would use to structure my own apps, using a pattern like * #/$1 to have requests redirect and use the directory's index.html.

stuartpb commented 9 years ago

Also useful: an app-wide option for setting a redirect rule to direct requests for */index.html to canonical $1/, the way requests for directory names without a following / are. Also, an option for specifying whether canonical paths for directories (acting as extension-less "files" with their index.html) should or should not have a / suffixed.

Though shorthands for these rules in the options would be nice, just keeping it simple with a redirect array in the YAML config containing a series of strings (or objects with from and to fields) matching the capture and replacement formats described above for 301/302 files would be enough for me (as I'm pretty sure this would permit translation into Apache and Nginx configs' redirect rules, and likely rules for other file servers a CDN may choose to use).

stuartpb commented 9 years ago

Here's my overall thoughts on what a Static Access Rules Spec would look like (I'm going to go with "MUST" for all the rules below, since I think that's the model used by specs like the HTML Living Standard, where browsers are allowed to be divergent from the spec at the cost of officially being non-compliant with the relevant section).

Unlike the HTML Living Spec (which addresses a many-headed beast that is far from any kind of "settled" functionality or uniform surface that can be targeted at once), I'm thinking this spec would be versioned (more like the DOM's "Levels"), with endpoints having compliance with specific versions of the spec (mod their own non-compliant shortcomings or extensions, which may be tracked by some kind of caniuse-type table).

Level 0

Any of the following behaviors may be disabled with (as yet unspecified) flags in access.yaml.
Servers MUST redirect requests to a location that is a directory without a trailing slash to that path with a trailing slash.
Servers MUST serve the content of index.html for requests to a directory with a trailing slash.
In the absence of an index.html file in a directory (or a 200.html or 30X redirect-file), servers MUST treat the request as a 404. Servers MUST NOT fall back to an alternate name like index.htm or default.htm. Servers MUST NOT render a directory listing.
Servers MUST redirect requests for an existing index.html to its container's slash-ending path. (This is so the default behavior doesn't break canonical-path assumptions.)
Servers MUST support /404.html. (There will be a property in access.html to override the file to use for any error.)
Servers MUST NOT hide any files not explicitly specified by the definition (ie. files beginning with an underscore are NOT hidden unless the user requests it as such in access.yaml).

Note that AFAIK no currently-shipping server supports all of Level 0 out-of-the-box (specifically, I'm pretty sure no server has directory listings off by default, nor do any redirect from index.html), but all the file servers I know do have configuration rules that can be applied to support Level 0 without any additional code to parse / translate access.yaml (meaning that if you upload an app with no access.yaml to a service that supports Level 0, you can be safe in knowing none of these undesired behaviors will be applied).

Probably out-of-spec for non-orthogonality

Servers MUST support the CNAME file, as a whitespace-separated list of canonical names for the files' host.

So, CNAME has a number of issues here that not only keep me from wanting to specify it as Level 0, but also keep me from including it in Level 1, or even putting it in the spec at all:

GitHub doesn't support multiple CNAMEs - only one. (I think you're meant to fork the repo if you want the same files under multiple hosts, which is kinda dumb.)
Without checking, I don't remember what GitHub does if you specify a list. (I think it takes the first line, but I'm not positive.)
I think CNAME has to be read and injected into the server's config - it can't use a static rule to have the server itself follow its behavior, the way the Level 0 rules work.
CNAME entails a bunch of conflict-resolution mechanisms if two projects try to use the same one (and barely any service I know has an actual mechanism for resolving the case where another user has sniped your domain - usually it's "contact support").
CNAMEs are more likely than other rules to entail mechanisms outside of mere rule translation for the file server (ie. determining host nodes for load balancing).
For the previous reasons, it's usually better to use some other mechanism/model, in interfaces outside the tree, to handle your virtual hosting layout. (See: Heroku's domains:add, or something that reads a TXT record or the actual CNAME of your domain from DNS to decide ownership.)
Not including it in the spec doesn't necessarily make any use of it non-compliant. (Indeed, it may get its own spec, which would be by-definition out-of-scope and unimplementable for a Heroku Buildpack implementing this spec - it would have to be translated by a separate tool.)

Level 1

Servers MUST support all the access.yaml properties defined in the first shipping version of the spec. (Minor versions may happen for clarifications, but new behaviors will be defined as new levels.) (This is just me being lazy for this draft, this will actually be replaced with the actual set of defined access.yaml rules in any usable version of the spec.)
Servers MUST NOT support access.yaml in locations other than the site root. (access.yaml is for project-wide configuration - locations other than the site root have their own files, which may be delegated to Level 2 for the reasons described below.)
Servers MUST respect the redirects list in access.yaml.
If a request for a file results in a 404, servers MUST walk up the path to serve any 302, 301, or 200.html definition above the requested path in the tree (in that order).
access.yaml and/or some other file definition (some kind of .accessignore file?) MUST have a mechanism for blackholing files (supporting wildcards, eg. to mirror Jekyll's ignore-anything-underscore-prefixed rule), treating accesses to them as 404 (by default) or 403 (with some kind of configuration) errors.
access.yaml may also support CNAME/virtual host definitions (MUST, if this actually gets placed in the spec - I haven't settled on whether or not it should be, for many of the reasons described around the CNAME file/list, along with the fact that it'd be full-on redundant with CNAME).

Level 2 (maybe)

The 200.html and 30X-redirect file behaviors are more complicated than translating access.yaml rules into the server's config format (they have O(N) access complexity, where N is every directory in the project), so they might actually be moved to Level 2 (or maybe only for the non-global case). (They wouldn't go in some kind of Level-1-alternative because it would be pathological to force developers to support two different conflicting definition formats: they should have the option to use the files-in-tree format if their target endpoint/CDN supports it.)

On why I'm calling it `access.yaml`

I'm calling it access.yaml because this mirrors the name of Apache's .htaccess. I'm not calling it htaccess.yaml because I consider htaccess to be a very specific keyword describing Apache's configuration model/format, which this structure does not attempt to copy.

I'm not calling it something like config.yaml or options.yaml because there are many things in the lifecycle of an app that could entail a definition of options or configuration (especially on the tooling front), which have their own YAML structures that access rules shouldn't have to dance around to avoid key/model collisions. (See also: the Bower/Component battle over component.json that landed with Bower moving to bower.json).

I'm not dotting the file, because this is a core component of a SPA's structural definition, on the level of a Makefile - it should not be hidden to developers who wish to make changes to the app. (In fact, I nearly considered calling it Access.yaml.)

I'm using the yaml extension and not the yml extension because, while I appreciate that files like .travis.yml set a precedent for only supporting the yml extension, the three-character file extension format is an obsolete notion that died with MS-DOS. If I were to support three-letter extensions, I would start with ".htm" - which most endpoints aren't currently supporting, in favor of having one file at the .html extension.

captn3m0 commented 9 years ago

Just for reference, this is similar to the approach taken by Netlify, which offers both redirects and custom header support for static sites. Their redirect configuration is kept in _redirects file, which looks like:

/home              /
/blog/my-post.php  /blog/my-post
/news              /blog
/google            https://www.google.com
/home         /              301
/my-redirect  /              302
/pass-through /index.html    200
/ecommerce    /store-closed  404
/news/:year/:month:/:date/:slug  /blog/:year/:month/:date/:story_id
/news/*  /blog/:splat
/story id=:id  /blog/:id  301
/*    /index.html   200
/api/*  https://api.example.com/:splat  200

The last 2 are of note, because they allow:

Single Page Applications by shadowing
Proxying to third-party domains, allowing lots of APIs to work out of the box

The _headers file looks like:

/*
  X-Frame-Options: DENY
  X-XSS-Protection: 1; mode=block
/something/*
  Basic-Auth: someuser:somepassword anotheruser:anotherpassword

So, its just paths followed by the custom headers. They also have support for other fancy features such as pre-rendering for SEO, and webhooks.

Disclaimer: Have nothing to do with Netlify. And I wasted too much time looking for it in my HN comment history. HN Discussion

stuartpb commented 9 years ago

I'm going to bed now, but I'll take a look at those tomorrow- my first reaction is that, while that looks neat and I do like the way it combines multiple mechanisms (rewriting, redirecting, custom-erroring, and forbidding/hiding) into one status-oriented syntax, that syntax also looks a little over-engineered (eg. the route params - I'm not sure if :slug turning into :story_id is a typo, but it'd be both simpler to implement and harder to mess up if done as $1/$2/$3/$4) and over-loaded (the query string being a separate space-separated field complicates arity checking when it could just be part of the match field) - not to mention I don't see how _headers would work in a scenario requiring different custom headers for different pages.

If I were to follow this pattern, I'd probably call it accesstab or similar, after the tabular formats it resembles like fstab and crontab.

stuartpb commented 9 years ago

Okay, so I've had all this churning in the back of my mind as I went about my day (I've also had a great resurgence in appreciation for static-based app layouts, having this reliability as a baseline guarantee). Here are some notes from my current concept:

"Level 0" is the baseline spec that all other specifications are based on: to be fully compliant with this standard, you have to provide some kind of shim for these behaviors.
- As I mentioned, services will likely forever be non-compliant in various ways to support their various special/legacy quirks (like Apache doing listings by default, gh-pages/Jekyll hiding files starting with _, or surge.sh reading 200.html), no matter how many MUSTs you throw around. In light of this, services are urged to implement a standards/strict mode (akin to the way browsers leave "quirks mode" in the presence of a DOCTYPE) in the presence of certain files specified in this standard (which may or may not then explicitly opt into the platform's quirks/features in a portable fashion, akin to the box-sizing: border-box CSS rule).
Every further feature of the standard has its own file, defined by its own specification.
- This file-per-feature pattern seems to be closer to the approach being taken by most static-HTTP services/architectures (CNAME, _headers, 200.html, etc). This also lends itself to simpler formats, which lends itself to simpler tooling implementations (like shell scripts).
- Since YAML is, though nice to write, tricky to parse from a dependency standpoint (and capable of far more complexity than should be needed for any static routing definition), these files will all hew to simpler core-*nix-esque formats, like plaintext lists, tables of fields in whitespace-separated lines, or, at most, INI/TOML.
- This alleviates the problems of not being able to support the "whole spec" by nature of an implementation's design (ie. the buildpack approach).
- Like .editorconfig, certain servers may chose to read these files natively, while other servers / services may choose to read them with a module/plugin or by converting to their native format. At the level deemed to be compliant, the implementation MUST NOT expect its own format(s) to be defined in parallel to the specified standard files.
Defining certain rewrite/redirect behaviors for certain match rules will be defined in an "accesstab" file, which will largely resemble the _redirects file used by Netlify, but with a few changes to address the problems that have been raised with it (note that I haven't fully run through the use cases for these so some of these suggestions might be altered):
- Rows have a fixed 3 columns, each of which must be defined.
- The first field is the incoming match. Starting the pattern with / anchors it to the start of the path.
- I'm thinking not-starting the match with / may allow for recursive, directory-oriented patterns like the 200.html pattern described above, but I'm not sure. Needs more R&D: for now, they just must start with /.
- The second field is the redirect pattern, using numbered references, defining a path either absolute or relative to the requested location.
- I don't think anybody should use more than nine levels of fixed path attribute, but if mainstream servers universally support it, specifying numbers beyond 10 will be supported
- The third field is a status code, which describes how to implement the serving of the defined file (as a path redirect, response, or error file). 30X rules will be re-entrant, while other rules will not.
- Servers should evaluate each rule in order.
- I think recursion comes in here: for a 2XX rule to apply, the input must match and the file must exist: if both of these rules aren't met and no other match in the list applies
- That doesn't quite work in the presence of catch-all rules. Either there needs to be some kind of specificity mechanism, or relative rules need to exhaust their search path before continuing list evaluation, or it varies based on whether the match is absolute, or some combination of these... I don't know, I'm tired. I can do R&D after some R&R.
- Tooling may choose to have its own definition format that compiles down to accesstab: this can be used to implement things like named parameters (as well as in-tree definitions like the aforementioned 301/302 files, which said tooling should then compile options to blackhole/hide).
More complex server behaviors with simpler rules will be configured with an INI CFG I am currently calling "accessconfig" (mirroring files like ".editorconfig" and ".gitconfig").
- This is where an option to re-enable things like directory-listing index generation would go.
- This might include a section for headers (I need to do more research on common header patterns in this kind of project, to see if this is better served by its own file/format).
- Couldn't these also apply to paths? Is that what the sections of the TOML are going to be for?
Patterns like 200.html files as described above may be implemented with a rule in the accesstab. 30X files (which require reading the named file's content to determine the path to redirect to) were a bad idea and, if desired, should be searched and baked into the "accesstab" using tooling at publish time.
Provided that there aren't any major conflicts with existing precedents, I'm considering using "ht" where I'm currently using "access" in the filenames (ie. "httab" and "htconfig" instead of "accesstab" and "accessconfig", or maybe "htrestab").

Open questions:

What happens with percent-decoding? In what circumstances is it applied, for which characters?

Anyway, this is rapidly getting too complex to continue to draft in this issue, so I'm going to continue this work in a repo (with probably the occasional pingback for progress to this issue) at https://github.com/stuartpb/s4-specs.

stuartpb commented 9 years ago

Just remembered: one irking thing about the three-column model is that it doesn't allow a simple definition for "return the standard 403 or 404 page", so this might be cause for a "two-column rules use the default path" extension to the spec behavior. Again, I need to go to bed.

sintaxi commented 9 years ago

@stuartpb This is a really great thread. FWIW - surge.sh already supports most things you have mentioned. Here is a list...

Implicit Redirects & Clean URLs

[x] Server does not require file ext in request path. /hello-world will serve /hello-world.html.
[x] Redirect /foo/bar/ to /foo/bar if /foo/bar/index.html not present and /foo/bar.html is.
[x] Redirect /foo/bar to /foo/bar/ if /foo/bar.html not present and /foo/bar/index.html is.
[x] Catch-all 200.html is served (if present) with a status code of 200.
[x] Catch-all 200.html files are only used for text/html requests.
[x] If 200.html not found a fallback 404.html is served (if present) with a status code of 404.

Custom redirects using a `ROUTER` file

This is fairly self explanatory. It will redirect if it matches a pattern in the ROUTER file.

301     /:yr/:mo/:dy/:slug       /articles/:slug
301     /blog?title=:slug        /articles/:slug
302     /blog                    http://medium.com/sintaxi

Custom domain using `CNAME` file

The following CNAME file will server both http and https...

example.com

The following CNAME file will only server over https and it will force http traffic to https

https://example.com

The following does the opposite

http://example.com

Auto redirect `www` subdomains.

[x] If www.example.com does not exist it will redirect to example.com (if it exists).
[x] If example.com does not exist it will redirect to www.example.com (if it exists).

BasicAuth using `AUTH` file.

With an AUTH file you provide a list of user/passwords to protect the site.

thurston@sy.com:teenagedcomputer
kim@sy.com:goo

Not necessarily perfect design but these things all work today and they are fairly pragmatic features driven out of real use cases and most API choices have been vetted by users of harp or surge.

Personally I think the days of using config files for such things should be a thing of the past. 99% of the time we all want he same things anyway. Most of these features can be triggered by conventions. (eg. 200.html files).

stuartpb commented 9 years ago

@sintaxi Interesting. Does ROUTER support creating recursive / location-local rules (as would be possible for defining patterns like path-wide "/200.html" files as described above / on Twitter)?

Depending on what conventions and patterns I find with the other platforms, I might just go for standardizing on surge.sh's conventions and names. I'm still not wild about named route parameters, but if they're the only major difference between the existing ROUTER schema and what's strictly necessary I'll just standardize them (using the rules of https://github.com/pillarjs/path-to-regexp).

Also, does your CNAME file work in the multiple-entry list case? If no, what was the rationale around not doing so?

stuartpb commented 9 years ago

Also, re: defining HTTPS behavior in CNAME, I'm personally not crazy about the notion of explicitly downgrading HTTPS, as even the most justified cases for it (low-power hardware where HTTPS causes a significant dip in performance / spike in consumption, the ability to inspect wire traffic) aren't really justified when you consider the potential downsides (there's a reason browsers are only going to support HTTP/2 over TLS). Opting in to HTTPS upgrades, maybe, but I have my misgivings with the notion of forcing HTTP-only traffic at a definition level.

While I'm generally not the type of person to enforce a prescriptive best practice by gimping the things that are possible as part of a spec (which should aim almost exclusively to support any in-scope behaviors that were possible in its precedents), warelessly pasting in / writing "http://" at the beginning of your CNAME (or doing so because you've seen it done once, where it was intentional, and then copying the convention under the impression it's required) and having HTTPS be inadvertently stripped from incoming connections seems like too likely of a mistake to make. (In other words, the ease of triggering the behavior seems disproportionate to the extremely low desirability of that behavior).

stuartpb commented 9 years ago

Vis. CNAMEs supporting multiple hostnames: I personally don't think it's good practice (the whole point is to have a canonical name, after all) and would never personally use it, but the alternative would likely be hackier for somebody who wants to do such a behavior (without multiple host support in CNAME, they'd have to publish twelve versions of their codebase that only vary the CNAME file to re-use their files on twelve hosts), and considering that traditionally-not-necessarily static platforms like Heroku have natural support for multiple domains in this fashion, I feel it should be supported.

stuartpb commented 9 years ago

Note that CNAME is for hosting files under multiple names. Implementations MUST NOT redirect names specified in the CNAME file (implementations MAY redirect to names that are in the CNAME file - nowww/yeswww Normalization, following the logic @sintaxi described, will likely be a component of the CNAME spec). Specifying redirections at the host level is a separate concern that may be handled by separate systems and will be separate from the CNAME spec.

stuartpb commented 9 years ago

Also to be in the spec: a note explaining that you have to use a route in the ROUTER file if you want your site to respond to requests for /ROUTER with something other than your list of ROUTER definitions, because ROUTER is, by default, subject to the same file-serving rules as any other file in the repo. (The hairiness of defining rules for each special file is making me kind of want to put all of these in a subdirectory, not to mention the way it'd reduce the pollution of the root namespace that has been so historically bad.)

stuartpb commented 9 years ago

Also to note: while ALL of the codebase's files, by default, MUST be served normally (including those files that act as definitions), compliant compilers/generators/config-transpilers (both in implementations and in tooling) MUST take actions in their compiled files, if these files are not then re-introduced to the codebase, to respond to requests for those files as if those files did not exist (so eg. if pushing to your platform creates a customer_compiled-nginx.conf, that conf file needs to include a rule that either hides itself or responds with a certain page as a 200 OK, depending on what behavior the developer defined in their ROUTER for that path without the existence of the file).

Tooling that adds compiled files to a codebase (such as rendered template HTML files) MAY add rules in its generated ROUTER to hide any files (such as source templates), so long as these rules remain present and visible as part of the codebase after compilation.

TODO: define "the codebase".

stuartpb commented 9 years ago

Actually, I'm not sure about having the definition files exposed by default. Assuming there's a simple route to show/hide them (which I'm pretty set that there should be), both sides have the argument that it's easy enough to specify a rule in ROUTER to switch to the other.

On the one hand, magic rules are a pain, and this makes introspection/openness a common part of the platform (the way so much of the web already is). It's irritating to have to pick off special-case rules defined by the spec just to make a server behave more predictably.

On the other hand, exposing stuff like your routing rules, while you may or may not be exposing the files you may be hiding (and maybe you're only hiding certain files in one fashion and hadn't considered another way to access them, which is revealed by your ROUTER spec!), is a potential security hole, and "this is insecure by default, do this extra thing to secure it" is really a pretty bad scene. Also, I'm pretty sure there's precedent ie. in Apache hiding .htaccess - also there's a parallel to HTML here: <head> is only display: none as part of the UA stylesheet, and you're actually allowed to make it visible if that makes any sense to your model.

(TODO: define interaction between /../ and routing)

So, actually, I think I'm going to walk back the assertion that definition files must me served as 200 by default, and instead say they must be served as 404 by default. End users may then wish to expose them with a 200 /ROUTER rule.

Also the name I'm kicking around for a directory containing these files is _htspec, since that both makes the name sort between Uppercase and lowercase in ASCIIbetical sorting, and hides the directory naturally in Jekyll-based systems (which should still expose other _-prefixed locations in standards mode, for the "picking-off-special-rules-is-irritating" reason explained above).

stuartpb commented 9 years ago

It just occured to me the importance of adding the rules "Implementations in strict mode MUST NOT execute or interpret files outside this specification" and "Implementations MUST NOT write files to the code base in response to requests", basically as a way of officially saying "implementations MUST NOT turn into PHP".

yairEO commented 8 years ago

Any status about this? I find this a very hot feature, can't wait for it, I have demo pages which are broken because of lack of routing.

Thanks!

lapidus commented 8 years ago

+1

kasperpeulen commented 8 years ago

Is there something to get routing with # already working? For example I have an angular2 application: http://ng2-dart-samples.github.io/router/

But if I go directly to the route: http://ng2-dart-samples.github.io/router/home

I get a 404

lpil commented 8 years ago

:+1:

jasonniebauer commented 8 years ago

+1

SteveALee commented 8 years ago

+1

alanbeech commented 8 years ago

+1

colorgap commented 8 years ago

Guys I found a workaround for this issue, I created an angular2 app for the documentation of our open source app framework system and found this issue that upon refresh of the page getting 404. If I go to below link it works as i setup base href to /projectname/ http://colorgap.github.io/brush

and then app works perfectly fine. but if I refresh my page on About page then it fails and give 404 page. http://colorgap.github.io/brush/about

I found out that github page allow you to customize 404 pages. so i thought if I create a copy of my index.html and rename it to 404.html then it should fix the issue and it fixed it. Now I have perfectly working structure on github pages. I know this is not a perfect solution or desired but i guess it does the trick for now.

Really looking forward for some better solution. But for now create a copy of index.html as 404.html

csuwildcat commented 8 years ago

Hey folks, I came up with a hack that enables all sorts of route handling without taking the 404 SEO hit. It uses the 404.html page to hijack the inbound attempts and redirect them back at the index.html page with the correct path intact (via the History API): http://www.backalleycoder.com/2016/05/13/sghpa-the-single-page-app-hack-for-github-pages/

rafgraph commented 8 years ago

I’ve been working on a way to host single page apps with GitHub Pages for the past few of weeks. Check out the repo and example site. Repo: https://github.com/rafrex/spa-github-pages Example site: http://spa-github-pages.rafrex.com

I used a custom 404.html page with a redirect script that takes the current url and converts the path and query string into just a query string, and then redirects the browser to the new url with only a query string and hash fragment. For example, example.tld/one/two?a=b&c=d#qwe, becomes example.tld/?redirect=true&pathname=%2Fone%2Ftwo&query=a=b%26c=d#qwe. The GitHub Pages server receives the new request, e.g. example.tld?redirect=true..., ignores the query string and hash fragment and returns the index.html file, which has a script that checks for a redirect in the query string before the single page app is loaded and converts it back into the correct url.

Note: I was originally working on a React only solution, here, when @csuwildcat gave me the idea to use history.replaceState(...) instead of a react-router onEnter hook for the final stage of the query redirect.

SamGerber-zz commented 8 years ago

@csuwildcat & @rafrex: Thanks for coming up with workarounds for this issue. Maybe great minds just think alike? I know @rafrex has been working on his workaround for several days now, so maybe this is just another case of simultaneous invention?

rafgraph commented 8 years ago

@csuwildcat, I didn’t see your tweet, but I did see your post, nice work. You did things a bit differently with a meta http-equiv refresh I think (instead of a query string with a window.location.replace). The original repo I was working on was a React only implementation, and you gave me the idea to use the history.replaceState(...) instead of a react-router onEnter hook for the final stage of the query redirect. Thanks. If you look at the history of the react-github-pages open sourced repo, I had the redirect working way back on may 3rd: https://github.com/rafrex/react-github-pages/tree/bbe0efa0212261297d9f6e0cfd76023695681f4c (the example site is pretty bad and documentation is non-existent, but the redirect code is all there, and it all works). I hope this clears things up.

csuwildcat commented 8 years ago

It's cool, nice job making a template, I wanted to but haven't had the time. Hopefully people get value out of all this.

rafgraph commented 8 years ago

Cool, thanks. I think people will get value out of this, I know I certainly will.

SteveALee commented 8 years ago

Some good ideas here, thanks. I've been using a 404 file with client redirect for a while now but as pointed out it only solves some of the issues. Using the preprocessing is certainly worth trying however it does feel wrong to be using client behaviour it work around server failings. Especially for static sites which arguably should not require javascript (obviously not SPAs).

A 200 page like surge provide would be much better.

Incidentally, a custom 404 works with my organisation project page.

The fact that project pages are not at the domain root causes other problems too. Not just for routers which assume this.

If you use custom domains as a solution for the non root problem you then have to provide your own HTTPS certificate, though that is getting easier. Newer web features like progressive web app manifests require Https.

we need github to provide a solution that always uses the domain root and provide proper server-side reditect for unknown pages (eg 200 page)

rafgraph commented 8 years ago

I'm not a fan of the 200 page as a clean solution because you have to duplicate your index.html file in your 200.html file, that still feels hack, although less so (surge's 200 page is what gave me the idea to serve gh pages out of the 404.html file awhile ago).

My suggestion would be to have a .spa file in the repo which would turn on single page app compatibility (similar to how the .nojekyll file is implemented), and then GitHub's servers would ignore the path and always serve up index.html, and let the single page app handle undefined paths.

sintaxi commented 8 years ago

you have to duplicate your index.html file in your 200.html file

FWIW - with surge you can only have a 200.html if you like. No need to duplicate content in a index.html file.

My suggestion would be to have a .spa file in the repo which would turn on single page app compatibility.

With a .spa and a index.html file you now have two files in your project for something that could be accomplished with just one 200.html. Besides, why not match the 404.html paradigm? I tend to think there are too many dot files in projects these days.

Lastly, a 200.html (or .spa) should only respond to certain mime types. For example, /some-image.png should still 404 if it is missing.

rafgraph commented 8 years ago

I definitely agree that some mime types should still return 404 if missing.

Personal preference, but I don't think we should give up on the index.html paradigm. I'd rather have a settings file than not have an index.html file. Although really, all I want is native single page app support from GitHub Pages, however they decide to implement it.

rafgraph commented 8 years ago

@sintaxi, just saw you created surge, nice work. One suggestion, implement git push deployment (like github pages and heroku).

sintaxi commented 8 years ago

@rafrex Thanks a lot.

This thread should probably stay about hosting single page apps so I wont git into it but there are reasons were not using git. You can find us in surge chat if you to talk more on that.

spacedevin commented 8 years ago

The redirect solutions work, however they still generate a 404 header.

I decided to just make symlinks instead.

create a .nojekyll file
create a symlink from index.html to 404.html (ln -s index.html 404.html)
If you know all of your routes, and want 200 status codes, create symlinks for each route back to index. So if you had a route like /about, you would create a symlink called about.html since it automatically finds that file. (ln -s index.html about.html)

You can see it working here http://arzynik.github.io/gh-pages-spa/

Still would be great to have a better solution.

stuartpb commented 8 years ago

I think this discussion is just the booster shot I needed to get back to work on designing a standard for all of this - I'd been stuck on a name for it (one of the two hard things in computer science), calling it "s4-specs" and wondering if I could maybe shove in some more S-words (like "simple" or "single").

The new name I've settled on builds off of @rafrex's instinctual suggestion for a .spa sentinel file. Including spa in the name seems like a reasonable way of hinting "this contains single-page application" to a dev encountering the file for the first time. To give it a non-ambiguous, less-likely-to-collide name (one with some Google juice), I'm going with _spaspec: https://github.com/spaspec/spaspec-standard

allenhwkim commented 8 years ago

+1

paramaggarwal commented 8 years ago

So currently there are 3 options for deploying single page apps:

Now I read about all of these and was comparing their free plans. As per my understanding, Github Pages does not have these two things:

But from @arzynik's excellent suggestion above, using symlinks we can have both of these capabilities.

rafgraph commented 8 years ago

Correct me if I'm wrong, but the problem with symlinks is that they don't support dynamic routes, for example, /todos/:id.

stuartpb commented 8 years ago

That's one problem (there's also no guarantee that your deployment method will recognize symlinks) - another problem is that they don't have any way to control any other kind of HTTP status with a response, such as a redirect, or a 403 Forbidden.

_spacspec/routes, on the other hand, allows for all of this, and more.

rognoni commented 7 years ago

For information, this is the way used in firebase.json:

{
  "hosting": {
    "public": "public",
    "rewrites": [
      {
        "source": "**",
        "destination": "/index.html"
      }
    ]
  }
}

jimthedev commented 7 years ago

@rafrex @paramaggarwal

Correct me if I'm wrong, but the problem with symlinks is that they don't support dynamic routes, for example, /todos/:id.

You are correct. Just confirmed it by modifying @arzynik's example to accept route params. It does not work properly, meanwhile going to the same with hash location strategy works perfectly. Symlink is a decent solution if you have super simple routing, but that's about it.

NullVoxPopuli commented 7 years ago

Has anything happened with this?

GabrielDelepine commented 7 years ago

The repo spa-github-pages gives me the more appropriate answer.

Github redirect every missing files to the file 404.html if it exits. With a JavaScript redirection, it's possible to have a real spa hosted in the github pages.

jimthedev commented 7 years ago

@GabrielDelepine If I recall using 404.html will work as you described but unfortunately it still has an http status code of 404 even though the content is under your control. So while you can get it work, you aren't truly able to override the 404 http status code. Not sure how this would work with SEO since a spider might be fooled into thinking the page was not found and thus not worthy of serving / scraping. YMMV but ideally these routes would be 200's.

isaacs / github