Proposal: managing 'site' and 'documentation' separately

pngwn commented 3 years ago

We have spoken in the past about having a central 'sites' repository to handle things like deployment, styles, app shells centrally but that it would be nice for actual documentation to be located in the repos themselves. This is a more pressing concern now that we have multiple official projects and would like a way to surface the relevant documentation on the site somewhere.

Overview

tldr: app-shells fetch content from APIs. Data returned by those APIs is independently deployed via the CI/CD pipeline of the individual repos, the CDN caches for the relevant endpoints would be invalidated by that independent deploy step.

In short, this proposal will suggest a distributed architecture. The app shell would be a single deployment (or one for each 'site', more on this later) and each of the projects documentation would be additional independent deploys, managed by the CI/CD pipeline fo the relevant repo. The app shell would fetch the latest docs for each project when they were requested (at runtime), this means the app shell would not need to be redeployed when there were documentation changes to any of the individual projects, likewise the projects themselves would not need to be redeployed when the app shell changed (unless there were API Contract changes that needed to be honoured).

Essentially this boils down to sticking the docs, examples and tutorials somewhere as JSON and requesting them at runtime. We are basically doing this right now, this proposal is to move them as much as anything else. There is a more complex alternative but I don't think we'd get anything out of it unless we started creating interactive docs (with something like mdsvex), at that point we could use a similar approach but we would also need to make use of remote ESM and mount them to the page(basically a microfrontend approach). It would be the same concept though: independent deploys, requesting data and assets from endpoints which expose a known contract, and handling them in a predefined and consistent manner.

The app-shell(s) would still be in complete control of what gets rendered, where, and how but it wouldn't know anything about the content, nor would it need to.

types of thing

The specifics of the implementation are not too important here and are up for debate but I'll use more concrete examples to communicate intent. First of all we would have independently deployed 'components' like this:

sveltedocs_basic-arch

Then we could use them as needed, requesting the data from the various APIs. Having them distributed rather than within the app-shell itself means we can independently update them without needing to talk to the shell directly.

I would propose that docs, examples, and tutorials all live with their respective project as that seems to make the most sense. All of the logic and functionality for the application itself would live within this repo and be the responsibility of the app-shell(s).

The below is mainly an exercise is showing that we can drive this whole thing with data from APIs, we are actually in a really good place to make this happen because of the way the various types of documentation are designed.

`app-shell`

The app-shell is 'special' in this instance, it is a full site (html, styles, js, etc) but it does not actually contain the docs we want to embed, they are remote. This should be either API driven or actual javascript modules (I'll expand on this). The app shell would contain remote references that would be loaded at runtime. The app-shell is the thing that gets loaded by users when they visit svelte.dev.

There would be one app-shell for each 'site'. We may, for example, have 3: svelte.dev, kit.svelte.dev, native.svelte.dev`. There would be 2 or 3 API endpoints that provide data. I forgot about the blog posts but they could also be another endpoint as could cookbook or recipes if we decided to have more indepth articles and split up the API docs a bit, as we have previously discussed. This is a scalable solution that can adapt to our needs as we go.

`*-docs`

Our docs are static and we do not need to do anything fancy here. Docs sites could have their own endpoint api.svelte.dev/docs/:projectname or something and could return some JSON containing the 'built' HTML (from the markdown) as well as a list of navigation links (used to render the side navigation). The schema could be something like this:

interface Link { 
  name: string;
  href: string;
  subnav: Link[];
}

interface Docs {
  links: Link[];
  content: string;
}

And example payload might look like this:

const docs = {
  content: "<h2>Hi i'm docs</h2><p>boo</p>",
  links: [ 
    { 
      name: 'intro', 
      href: '#intro', 
      [
        {
          name: 'bit after the intro', 
          href: '#bit-after-the-intro'
        }
      ]
    }
  ]
}

This would give the information we needed to render a like for like of the current docs but it could obviously be expanded to enable other functionality.

In this instance the app-shell page that loaded this data could render the docs easily. The load method for the route could load this data and pass it to the template as props. The content could be injected into the page with {@html content}. The links array could be iterated and used to populate the sidebar navigation.

`*-examples`

The examples, while more complex, are also just a bunch of data that is passed into the svelte-repl component. We could serve this data via an API. There are few options here. If we don't have too much data then we could just provide the whole thing as an array of examples, or we could serve them individually via two endpoints: api.svelte.dev/examples/:site, api.svelte.dev/examples/:site/:id (which probably makes more sense):

interface File {
  name: string;
  type: string; // file type
  content: string;
}

interface Example {
  name: string;
  slug: string;
  files: File[];
  thumbnail: string;
}

interface ExamplesMeta { 
  name: string; 
  slug: string; 
  thumbnail: string 
}

interface ExampleCategory {
  name: string;
  examples: ExamplesMeta[];
}

type examples = ExampleCategory[];

`*-tutorials`

This would be very similar to the above but we'd need a little more information. Although the way these are rendered on the svelte site are different to the examples they aren'y structurally that different. We still have top level categories with nested tutorials, each tutorial having an accompanying REPL.

The additional information is the tutorial content (markdown) and we also need both the start and end state of the tutorial (for the 'show me' button).

interface File {
  name: string;
  type: string; // file type
  content: string;
}

interface Tutorial {
  name: string;
  content: string;
  slug: string;
  initial: File[];
  // not a feature for every tutorial
  complete?: File[]; 
  // could add a 'can_be_completed' prop here but can easily be inferred
}

interface TutorialMeta { 
  name: string; 
  slug: string; 
  thumbnail: string 
}

interface TutorialCategory {
  name: string;
  tutorials: TutorialMeta[];
}

type examples = TutorialCategory[];

How many shells?

This is really a question of how many sites. I'd propose three svelte.dev, kit.svelte.dev and native.svelte.dev, mainly because a single site can only really have one set of tutorials and examples and svelte, kit and native may all have their own. There is a question around whether or not we want to blur the lines between svelte and kit a little more but some degree of separation is required and we aren't that close to having a kit repl regardless, although it is doable in principle.

As for svelte native, I am keen to make it more official and reduce the burden on @halfnelson, we need to make the svelte-repl more agnostic so it can be used for that purpose but even if we can't (or don't do it soon), bringing the reusable bits into this repo, sharing some of the basics, and sharing pipelines would mean @halfnelson can focus on maintaining the docs and examples, rather than the site. In fact this approach could reduce the maintenance burden of all disparate sites and focus efforts a little better.

Would welcome thoughts, I don't think the effort here is significant.

Rich-Harris commented 3 years ago

I like where your head's at, though my main concern is around previewing changes to the docs locally. Right now it's extremely valuable to have the Kit docs available on the filesystem while working on kit.svelte.dev, for example.

I think it's also worth thinking through mdsvex stuff in more detail, as I can easily imagine us wanting to go down that route

pngwn commented 3 years ago

I thought a little about local docs, we could figure out a way for the repos themselves to spin a local server that the apps in this repo could connect to in dev mode. That would be very easy but you would need to start them up separately.

I think it's also worth thinking through mdsvex stuff in more detail, as I can easily imagine us wanting to go down that route

This is basically my life right now, micro-frontends. There are one or two challenges we'd need to address (shared dependencies) but it wouldn't be too challenging. Instead of fetching + passing data into props, you would dynamically import + instantiate a component instead. This is more complex but I'll see what the implications are and put together an example. We wouldn't be able to fetch a module and pass it down as props on the server, so I'm curious about the implications here for doing this inside a 'load' hook. I'll investigate this a little more.

benmccann commented 3 years ago

From discussion on Discord, @pngwn's proposal is that the CI would parse the .md files, create JSON data, and send to API. Server would receive API request and persist in kv store. When the page is viewed data would be grabbed out of kv store and rendered

Some things we'd need to figure out: would we send all docs to API on each commit, creation of kv stores, creation of endpoints to accept API writes, handling auth tokens, database migrations whenever data format changes, running the site locally (what to do about kv store in this case), etc.

pngwn commented 3 years ago

We don't have to send to an API specifically, but that is an option, we can update our DB in whatever way we want to. For example cloudflare's kv store has an API that allows you to update it programatically, AWS sdks allow the same thing with dynamo, google probably has a thing.

pngwn commented 3 years ago

We could do this in CI, only running the relevant workflow when the docs path changes (we can standardise on this if we need to). We would only send the docs for that project which would update some store.

We could have a single kv store for all docs and use structured namespaces to make everything easy to get/update/create.

If we did this programmatically rather than via an API we can use access tokens rather than authenticating with the API. We won't need to handle db migrations particularly because the source code is our source of truth, everything that ends up in the store is already in the repo. We change the schema generation tools and rerun them for all projects and we would have our new schema.

pngwn commented 3 years ago

@benmccann

running the site locally (what to do about kv store in this case)

The kv store is an implementation detail. As I mentioned in a previous comment, we just need an API with the correct data which is something that the repos themselves can provide quite easily.

benmccann commented 3 years ago

The motivation for the proposal: put all plugin docs on the main svelte.dev site (e.g. estlint, prettier, rollup, etc.)

Conduitry commented 3 years ago

Is there a sensible way to have some sort of deployment preview for docs changes? One thing that regularly happens with the current sites is that someone will submit a change to the Markdown files without checking how that gets displayed on the site locally or even checking that it doesn't crash the site. Having to run two different apps locally to vet any proposed change makes each PR a little harder to review. But I'm not sure how to have preview deploys of PRs. If we had a Netlify PR deploy preview of the docs API app, we could at least check whether it crashed the app, but without a sites app pointing to that instance, we couldn't preview the whole thing. Is there some solution I'm not thinking of?

Rich-Harris commented 3 years ago

What about https://kit.svelte.dev/docs?branch=master and https://kit.svelte.dev/docs?branch=some-feature? If the API could serve docs for different branches, the whole thing could be fairly straightforward — no preview deploys necessary

pngwn commented 3 years ago

I'm about to add some details about how this could work in general but before that I want to address the branch preview thing.

The short version is that it is incredibly problematic for PRs that are created via forks for security reasons. From the point of view of github actions, it isn't even really possible. Forked sources for PRs do not have access to secrets which means any deployments relying on secrets wouldn't work at all, the reason they don't work is that it is a huge security hole. You can get this to work (and it works with netlify, cloudflare pages, vercel, etc.) if you install a github app (which has different rules around access) or allow it access to the secrets via the pull_request_target event rather than pull_request event but this obviously introduces the potential for harm as wel as requiring some additional setup.

You obviously can access secrets in pull requests whose source is a branches of the repo to which the secret belongs.

I don't know of a good solution to this. I think the value of pull request previews is very, very high. One possible approach to this is to segregate production and preview builds by using a different set of credentials for the pull request previews. This would mean a different netlify deploy just for branch previews so that a malicious actor couldn't interfere with the main site, a different zone/worker/kv store for the cloudflare stuff with a token limited only to this set of resources (specific to this zone), and a different one for the production environment. While this is a bit of effort it would mean that any exploit would only impact branch deploy environments and never impact the production environment.

pngwn commented 3 years ago

In terms of implementing the docs stuff this is where i'm currently at:

The API

We need a worker to handle the API itself, Luke has most of this setup over api.svelte.dev with the relevant keys and whatnot. Adding the logic will be straightforward. We can easily push branch deploys, versioned docs, or whatever based on some query string by appended key names with some kind of version, branch or pr id suffix: svelte:api:latest svelte:api:next svelte:api:3.99.01. As soon as we decide on an approach this can be implemented easily. It doesn't address how we actually get that stuff there safely and whether this should be in a different account or namespace but that is a separate issue.

Getting stuff into the store

This is where it gets more interesting. I tried about 476 approaches (actually 3) and nothing really worked the way I wanted it to.

I tried running a central workflow via the source repo. So language-tools could trigger a workflow defined in this repo. That didn't work because the event payload has size limits that documentation exceeds quite easily.

I also tried running a workflow and instead of sending the docs as a payload, letting the remote workflow request the docs via the GitHub API. That didn't work either because I kept hitting the 5000 request limit. You have to recursively make a billion calls to get the required data. This worked beautifully and I was incredibly pleased, until I discovered the API limits. 5000 an hour sounds like a lot but you actually make hundreds trying to get hold of the svelte docs, for example. We could be a littler smart about some of this but I feel with several site using the same approach we could easily hit this limit in busy periods.

The simplest approach to this is to create a .github repository and define a reusable javascript action inside it (which I think can be used by any repo in the org without needing to publish it, need to check this isn't enterprise only). This action would do the following:

Take in a cloudflare token as an input, each repo would pass this token via a secret (storing it as an org level secret makes this available to all repos in the org).
Build the docs. This would differ slightly depending on the type: plain docs, tuts, examples, etc. It would use a consistent formatter to convert from MD into HTML, with any necessary custom transforms.
Upload those docs to a cloudflare KV store via the REST API using the key that was passed in. They would now be accessible via the worker API. The key name would be based on the package or repository, the doc type and the version/tag/branch (whatever we decide). This might look like svelte:api:latest or language-tools:faq:some-branch or kit:api:beta. The specifics of this are up to us. This will roughly map to endpoints like api.svelte.dev/docs/:repo/:type?v-version -> api.svelte.dev/docs/svelte/api?v=latest or whatever. Without a query parameter docs would default to latest?=

I do not really know how we safely do branch previews with most contributions coming from forks but I'll look into it.

Thoughts?

pngwn commented 3 years ago

Just to follow on from Rich's comments. Yeah, previews for docs would be additions to the API accessible via a query string or whatever that the app url can pass along to the API request but deploy previews for the app shells themselves would also be relatively straightforward but also require some setup. The security issues I mentioned above exist regardless of what form the previews take.

Rich-Harris commented 3 years ago

I guess we can close this, now that it's basically done (and is powering kit.svelte.dev)

sveltejs / sites