Replace the https://drafts.csswg.org/ backend

Summary

This is in part a specific concrete proposal to consider replacing the current https://drafts.csswg.org/ backend with the backend from https://andreubotella.com/csswg-auto-build/ — but more broadly, it’s a detailed problem description, with a suggestion that we use this issue to discuss what our actual user requirements are for the site, and how we could best address those user needs.

Problem description

The record of “https://drafts.csswg.org/ down again” reports from users in https://github.com/w3c/csswg-drafts/issues/6528 shows that for nearly a year or more, users have run into chronic problems with not being able to read specs from the https://drafts.csswg.org/ site because its backend has become wedged.

As far as how often that backend gets wedged and prevents users from being able to read the specs there, the comment at https://github.com/w3c/csswg-drafts/issues/6528#issuecomment-1157497732 has the following image that seems to indicate it happens at least 3 or 4 days out of every week.

The responses so far in https://github.com/w3c/csswg-drafts/issues/6528#issuecomment-1184750613 and https://github.com/w3c/csswg-drafts/issues/6528#issuecomment-1158540588 have stated that unless someone can provide funding for a person-month of engineering effort, the site is not going to ever get fixed.

It seems worth noting that:

If a person-month of engineering effort is required to fix the current backend, then we have a responsibility to consider whether that backend that requires that much work to fix is the right choice to begin with
If only one person understands the current backend and has access to fix the current problems with it, we have a responsibility to consider whether a backend which has a single point of failure like that is the right choice to begin with.
If the backend setup isn’t documented anywhere, and it’s not using a software stack that we have a lot of other contributors in our space who’d likely be able to help maintain it, we have a responsibility to consider whether a backend which is that relatively idiosyncratic is the right choice to begin with.

So it seems necessary to consider alternative solutions that:

don’t require a person-month of engineering effort to make happen
won’t leave us all continuing to rely on a single person to keep the site up and running reliably
could be maintained by a much larger number of people
are based on a software stack that’s already widely used and widely understood by contributors in our space

One proposed solution

Replace the current https://drafts.csswg.org/ backend with the backend from https://andreubotella.com/csswg-auto-build/

Pros

Very high reliability/availability.
Uses the same GitHub Pages backend that all other W3C working groups are already using at https://w3c.github.io/ URLs.
Based on a software stack that’s already widely used and widely understood by contributors in our space.
Won’t require a person-month of engineering effort to make happen (instead, is already working and available).
Won’t leave us all continuing to rely on a single person to keep up and running — giant pool of other people who could help.

Cons

The only downside I am aware of is that GitHub Pages has some limitations the current https://drafts.csswg.org/ doesn’t have — most notably an inability to do server-side redirects and an inability to set arbitrary HTTP response headers.

However, those limitations in GitHub Pages are well-known and well-understood, and there are workarounds or mitigations.

Lack of server-side redirects is probably the most-serious limitation — but it can be addressed by using client-side redirects instead. Admittedly not optimal, but does work — as evidenced by the fact https://andreubotella.com/csswg-auto-build/ is already using it.

But from a user point of view, the effect’s the same — users still end up in the right place. So, given the problems we have with the current https://drafts.csswg.org/ setup — and the cost required to fix it — the benefits make the limited replacement worth the tradeoff.

As far as the inability to set arbitrary HTTP response headers, it’s actually already possible to address that too — by using a Service Worker. See, for example, https://github.com/gzuidhof/coi-serviceworker.

If anybody’s tempted to dismiss this proposal (of switching to the https://andreubotella.com/csswg-auto-build/ setup) out-of-hand, please resist that temptation and please let’s instead try to use this issue as place to have a constructive discussion about what our user requirements are and what the right solution would look like.

https://andreubotella.com/csswg-auto-build/ may turn not to meet our requirements — but at least seriously considering it and having a specific discussion about it can help lead us to figuring out a solution that meets the user needs.

A few relevant points:

In addition to serving drafts.csswg.org the same application also serves drafts.fxtf.org and drafts.css-houdini.org
It also serves the Shepherd spec anchor/link database and spec related parser (used by Bikeshed and Respec), which is auto-updated on each draft commit.
It also serves api.csswg.org/bikeshed and keeps its copy of the spec anchor/link database up to date on each draft commit.
In addition to generating individual drafts when they are committed, it regenerates all other drafts when the anchor/link database gets updated to keep all the cross references up to date.
In addition to serving all the current versions of the drafts, it keeps all the historical versions and can serve the generated specs at any point in their history via a dated URL (and exposes the history in the UI from the home page).
In addition the server handles the CSS Test Harness which is still used to generate CR exit criteria and keeps the test suites built and the test harness synced with the spec anchor database.
The funding to update the server has already been approved by a sponsor and we're just waiting for final paperwork before the work begins (hopefully within the next few weeks).
All the source for the server is open, and always has been. Anyone could have stepped in to help at any point, this hasn't happened since 2014 when the draft server first went online. If you think starting something new will get more people involved, I'd like to see some evidence.
There have already been discussions within the CSSWG to replace this infrastructure. No one has stepped up to do it and no concrete plans have emerged.

So if you want to start yet another discussion to rebuild all this all over from scratch, a few weeks before this all gets sorted, from plans that have been in motion for a long time, have at it. Personally, I don't think having this conversation yet again is helpful.

Any replacement will also require a commitment from whoever builds it to keep hosting it and to maintain it going forward (I also have been maintaining the server infrastructure).

I'm also curious how much effort you think replacing all this will be vs fixing what's already there and who you expect to do the work.

In addition to serving drafts.csswg.org the same application also serves drafts.fxtf.org and drafts.css-houdini.org

A replacement served using the https://andreubotella.com/csswg-auto-build/ setup can do that too.

It also serves the Shepherd spec anchor/link database and spec related parser (used by Bikeshed and Respec), which is auto-updated on each draft commit.

It also serves api.csswg.org/bikeshed and keeps its copy of the spec anchor/link database up to date on each draft commit.

In addition to generating individual drafts when they are committed, it regenerates all other drafts when the anchor/link database gets updated to keep all the cross references up to date.

In addition to serving all the current versions of the drafts, it keeps all the historical versions and can serve the generated specs at any point in their history via a dated URL (and exposes the history in the UI from the home page).

In addition the server handles the CSS Test Harness which is still used to generate CR exit criteria and keeps the test suites built and the test harness synced with the spec anchor database.

All of those requirements are separate from the requirement for users to be able to reliably read drafts.csswg.org specs.

From the fact that the current backend uses a single system to do those 5 other things in addition to serving the CSS specs themselves, it doesn’t necessarily follow that any system for serving the CSS specs for users in the wider community to read must also do those other 5 things too. It’s not axiomatic that all those things must be coupled together in the same way they are now.

The record of users reports we have is overwhelming just from users trying to read CSS specs — as evidenced by the almost-one-year worth of user reports at https://github.com/w3c/csswg-drafts/issues/6528. I can recall seeing one or two reports elsewhere (I think just in the WHATWG Matrix room) about api.csswg.org/bikeshed and maybe the spec anchor/link database not working — but I can’t recall ever seeing reports from actual end users about any of those other 5 things not working.

That would seem to suggest we should try, if we can, to optimize for the known needs of end users in wider community. And if having a single system that does 5 other things too has the effect of making things less reliable for end users who just want to read the CSS specs themselves, then that would seem to suggest we consider if we can decouple the just-serve-the-CSS-specs-themselves-for-end-users need from the rest of things.

It seems worth noting here that there do in fact seem to be a lot of end users who do actually want to just read the CSS specs. Part of that is because every CSS article in MDN has a link to the relevant parts of the CSS specs that define each CSS feature documented in MDN. Also, Stack Overflow questions and answers about CSS features also sometimes (maybe often) include links to the relevant parts of the CSS specs.

Users coming in from those places don’t have need for the other 5 things listed out above. Instead they just want to read some part of a CSS spec in order to solve some immediate problem they’ve run into, or to find an answer to some question.

The funding to update the server has already been approved by a sponsor and we're just waiting for final paperwork before the work begins (hopefully within the next few weeks).

I raised this issue without knowing that — because until it was stated in the comment above, I hadn’t heard that.

But I don’t believe the fact that there’s a sponsor on the horizon relieves us from responsibility for looking in detail at the actual end-user needs — and based on those needs, trying to make an objective assessment of what could be the best way to address those needs.

All the source for the server is open, and always has been. Anyone could have stepped in to help at any point, this hasn't happened since 2014 when the draft server first went online.

One extreme way to interpret that is the other stakeholders in the wider community are lazy and ungrateful and have for all this time just been sitting around waiting for someone else to fix the problem for them for free.

But some other possible ways to interpret that is that although the source for the server is open, and always has been, maybe very few people are even aware of that fact, and maybe very few people have any idea where the source might even be — until it was stated in the comment above, I personally certainly wasn’t aware of it. And even at this point, I don’t even know where the source for it all might be.

But along with that, even if others were to know where that source is, in order to try to help through code contributions, they’d need to also know how to test their changes, and how the changes get deployed, what exactly it’s served from, etc.

But on top of all that, in order for anybody else to try to make fixes to the code or deployment, they’d first need to know what’s broken: What specific part of the code causes the system to get wedged 3 or 4 days out of each week? Or if it’s not part of the code, what part of the deployment system or the server ecosystem needs to be changed?

I don’t think others in the community have any answers to those questions at this point (I certainly don’t know the answers myself). And without knowing some of that information, how could contributors even start to try to help fixing it?

If you think starting something new will get more people involved, I'd like to see some evidence.

https://github.com/andreubotella/csswg-auto-build is evidence at least of a starting point that seems likely to allow more people to step in when the need arises. One way to interpret the fact it so far has only one contributor would be to assume that nobody else cares about it. Another possible way to interpret it is that it’s so far been working reliably enough every day that nobody else has yet needed to step in to help work on it.

One fact I can add here is that because the unavailability of drafts.csswg.org was regularly breaking the https://github.com/w3c/mdn-spec-links/ build (tool used for adding MDN annotations to specs), I finally had to switch that build over to using the https://andreubotella.com/csswg-auto-build/ site instead. And ever since I made that switch, I’ve not run into a single instance of build breakage happening due to the https://andreubotella.com/csswg-auto-build/ site being unavailable.

There have already been discussions within the CSSWG to replace this infrastructure. No one has stepped up to do it and no concrete plans have emerged.

Someone did step up to create https://andreubotella.com/csswg-auto-build/, after having developed a concrete plan to make a working alternative for serving up the CSS specs for end users to read.

So if you want to start yet another discussion to rebuild all this all over from scratch,

I am not suggesting we start a discussion about how to rebuild all the things listed above. I am instead suggesting we have have a discussion about the known highest-priority end-user need we have, and how to optimize for that need.

And meeting that need would not require rebuilding from scratch. Instead, the work has already been done, in https://github.com/andreubotella/csswg-auto-build, and the alternative system has already been up and running — with high reliability — for at least 4 or 5 months now.

Any replacement will also require a commitment from whoever builds it to keep hosting

We won’t require special hosting if we were to use the https://github.com/andreubotella/csswg-auto-build setup. It’s just using the same free GitHub Pages hosting mechanism used by https://w3c.github.io/ — that is, the same hosting mechanism that every other W3C working group has already been using for years to make their specs available.

it and to maintain it going forward (I also have been maintaining the server infrastructure).

If it’s hosted through https://w3c.github.io/ using GitHub Pages, then we wouldn’t need anybody to give any special maintenance commitment — and it wouldn’t require anybody to maintain separate server infrastructure for it.

I'm also curious how much effort you think replacing all this will be vs fixing what's already there and who you expect to do the work.

The work on an alternative system has already been done. https://andreubotella.com/csswg-auto-build/ is the existence proof. And it’s already running. I don’t know know much effort was required to create it and make it work — but that doesn’t seem relevant at this point, because the work needed (or the bulk of it) has already been done. So I am not expecting anybody to need to do a bunch more work to create it.

As far as estimating how much work is required to fix the existing drafts.csswg.org backend, I’ve got to admit I have absolutely no idea — and I think nobody else in the wider community has any idea at all either. But I’ve seen the assertion that fixing it will require a person-month of engineering effort, and that seems to me like a very surprisingly huge amount of work — and I cannot imagine that switching to deploying a mechanism that uses the https://github.com/andreubotella/csswg-auto-build setup would need anything close to a person-month of engineering effort to make happen.

Thanks @sideshowbarker for splitting this issue out from the down again notification thread and for suggesting an alternative which would at least allow the EDs to be served reliably.

Thanks to @plinss for providing the server infrastructure and for clearly documenting here the various services that are provided. We do depend on all of them; a stopgap solution that only serves drafts but means that Bikeshed and Respec no longer autolinks to the current state would be bad.

However, a draft server which is frequently down is hampering work:

I often have to link to the older /TR version of a spec because the drafts are down
publication is frequently paused because the linkchecker reports broken links, because the server is down)

In addition the server handles the CSS Test Harness which is still used to generate CR exit criteria and keeps the test suites built and the test harness synced with the spec anchor database.

the tests widget is now actively misleading (compare the reported zero tests for CSS Color 5 on Shepherd to the 3529 tests on wpt) because new tests, and corrected tests, are not reflected in those results. It seems that updates stopped happening over a year ago.

So we do need to have this conversation, because the comments about a single point of failure are correct and there needs to be a way for multiple people to understand the system and to be able to help fix and maintain it.

The funding to update the server has already been approved by a sponsor and we're just waiting for final paperwork before the work begins (hopefully within the next few weeks).

Great news. I hope some of that funding can go to better documentation of what services are provided and how they work. Its unfair to expect @plinss to carry out all the maintenance single-handed on a volunteer basis.

But some other possible ways to interpret that is that although the source for the server is open, and always has been, maybe very few people are even aware of that fact, and maybe very few people have any idea where the source might even be — until it was stated in the comment above, I personally certainly wasn’t aware of it. And even at this point, I don’t even know where the source for it all might be.

But along with that, even if others were to know where that source is, in order to try to help through code contributions, they’d need to also know how to test their changes, and how the changes get deployed, what exactly it’s served from, etc.

But on top of all that, in order for anybody else to try to make fixes to the code or deployment, they’d first need to know what’s broken: What specific part of the code causes the system to get wedged 3 or 4 days out of each week? Or if it’s not part of the code, what part of the deployment system or the server ecosystem needs to be changed?

I don’t think others in the community have any answers to those questions at this point (I certainly don’t know the answers myself). And without knowing some of that information, how could contributors even start to try to help fixing it?

I have previously skimmed the code that backs drafts.csswg.org and Shepherd (https://hg.csswg.org/dev), and it seems quite complex and hard to approach. I guess this makes sense, since that code doesn't only power the current CSSWG editor's drafts, but although I'm definitely biased here, it seems like the setup in https://github.com/andreubotella/csswg-auto-build is far more simple and approachable.

I do worry about the fact that build-index.py (which builds the https://andreubotella.com/csswg-auto-build index and the client-side redirects) uses Bikeshed as a library in order to parse the spec metadata – as opposed to the Github Actions workflow, which uses the CLI to build the specs. I wonder how future-proof and maintainable that is. Though the rest of build-index.py could certainly be refactored to be made more maintainable.

Someone did step up to create https://andreubotella.com/csswg-auto-build/, after having developed a concrete plan to make a working alternative for serving up the CSS specs for end users to read.

The work on an alternative system has already been done. https://andreubotella.com/csswg-auto-build/ is the existence proof. And it’s already running. I don’t know know much effort was required to create it and make it work — but that doesn’t seem relevant at this point, because the work needed (or the bulk of it) has already been done. So I am not expecting anybody to need to do a bunch more work to create it.

I didn't really "develop a concrete plan to make a working alternative" – I started working on a proof of concept on my spare time, after being frustrated that the server was consistently down when I start working on European mornings. That proof of concept then turned out to be generally useful, and I worked some more on it as part of my work at Igalia. Since I started on my spare time, I don't have a number of hours that it took, but the initial version was done in two days (February 11th and 12th), after which it was already building the specs and auto-updating.

In addition to generating individual drafts when they are committed, it regenerates all other drafts when the anchor/link database gets updated to keep all the cross references up to date.

I don't think the https://andreubotella.com/csswg-auto-build setup can run Shepherd, and so it can't know when the anchor database would get updated. This is one concern that I had as soon as it was suggested that the same setup could be used for drafts.csswg.org, and I don't think there's a way around it while Shepherd is used as the anchor database for Bikeshed. But my understanding is that Bikeshed is, on the long run, planning to switch to Webref, which is backed by Reffy – and it seems like Reffy could indeed be used in the CI workflow.

I don't think there's a way around it while Shepherd is used as the anchor database for Bikeshed. But my understanding is that Bikeshed is, on the long run, planning to switch to Webref, which is backed by Reffy – and it seems like Reffy could indeed be used in the CI workflow.

Thanks for mentioning that — I’d forgotten that the known plan was for Bikeshed to move way from Shepherd.

I notice there’s an open issue at https://github.com/tabatkins/bikeshed/issues/1761 and there’s no response there, but I do seem to remember from discussion in the WHATWG Matrix room that there’s agreement on moving to Webref. Looking back, I found one mention at https://matrixlogs.bakkot.com/WHATWG/2022-05-02#L11 but I seem to recall there having been more discussion about it there than just that.

As far as Respec, it’s not clear to me that Respec is actually currently relying on the Shepherd spec anchor/link database. But even it were, I think the plan there too would be to switch to using Webref instead.

And anyway that concern would not be relevant for CSS specs, since (as far as I know) no CSS specs are using Respec.

a stopgap solution that only serves drafts but means that Bikeshed and Respec no longer autolinks to the current state would be bad

@svgeesus To be clear for the record here, if Bikeshed and Respec were using Webref, then when considering alternative solutions for serving CSS specs, we’d at least not have need for concern about that potential “Bikeshed and Respec no longer autolinks to the current state” problem, right?

if Bikeshed and Respec were using Webref, then when considering alternative solutions for serving CSS specs, we’d at least not have need for concern about that potential “Bikeshed and Respec no longer autolinks to the current state” problem, right?

Right.

All of those requirements are separate from the requirement for users to be able to reliably read drafts.csswg.org specs.

But all of those requirements do exist, and they all need maintenance too.

I raised this issue without knowing that — because until it was stated in the comment above, I hadn’t heard that.

Because you didn't ask.

But I don’t believe the fact that there’s a sponsor on the horizon relieves us from responsibility for looking in detail at the actual end-user needs — and based on those needs, trying to make an objective assessment of what could be the best way to address those needs.

Fixing the current infrastructure will result in a stable draft server far faster than this conversation is even likely to be concluded.

But on top of all that, in order for anybody else to try to make fixes to the code or deployment, they’d first need to know what’s broken: What specific part of the code causes the system to get wedged 3 or 4 days out of each week? Or if it’s not part of the code, what part of the deployment system or the server ecosystem needs to be changed?

Right, so where's the conversation about that? Rather than jumping to 'let's scrap the whole thing and start over'? Why not start by asking that?

As far as estimating how much work is required to fix the existing drafts.csswg.org backend, I’ve got to admit I have absolutely no idea

Again, you didn't ask.

FWIW, the one-person-month estimate isn't just to fix the draft server, it's to refresh all of the above infrastructure, which hasn't been properly maintained in years. The actual fix to make the server more reliable is about a week of effort. It's a matter of reducing the load on the DB server. And all this has been discussed before.

The thing about this that's really infuriating me here, is that the fundamental problem is not, and has never been, that the current infrastructure is not suited to task, or somehow unmaintainable, and needs to be replaced. The problem is that it needs to be maintained. While I was happy to do that while my W3C work was sponsored, I'm unable to dedicate that much of my time without support.

Rather than research the root problems, the knee-jerk response is to throw out what we have and build something new and shiny. Without any plan or even discussion of how that new thing will be maintained going forward.

We have a much larger problem than the draft server getting wedged from time to time. We have no sustainable path for maintaining our existing infrastructure, let alone resources for building something new. If people would take half the time that they spend complaining about the existing infrastructure and use it toward constructively working on building a sustainable method for maintaining what we already have, we would never have gotten to the point of having an unreliable server if the first place.

(And yes, I know discussions on that front were started years ago by Tobie at TPAC, but since then, crickets. Especially when it came time for people to put up money and other resources.)

the tests widget is now actively misleading (compare the reported zero tests for CSS Color 5 on Shepherd to the 3529 tests on wpt) because new tests, and corrected tests, are not reflected in those results. It seems that updates stopped happening over a year ago.

Because the test suite infrastructure has also been unmaintained. See above.

Because the test suite infrastructure has also been unmaintained.

I know (because I asked, and you told me). My point being that this also has consequences, it impacts our work substantially, and thus needs to be fixed. Unlike @sideshowbarker I'm not arguing to just fix the one problem that they were aware of when starting this thread.

I would also point out that I asked

is there some secret documentation that I missed?

after I had tried, and failed, to figure out how to update, un-wedge, or generally fix this.

I’m not going to weigh in on the actual debate, since I'm finding myself agreeing with both sides as I read this thread.

However, I'd like to address this:

Cons

The only downside I am aware of is that GitHub Pages has some limitations the current drafts.csswg.org doesn’t have — most notably an inability to do server-side redirects and an inability to set arbitrary HTTP response headers.

However, those limitations in GitHub Pages are well-known and well-understood, and there are workarounds or mitigations.

Lack of server-side redirects is probably the most-serious limitation — but it can be addressed by using client-side redirects instead. Admittedly not optimal, but does work — as evidenced by the fact andreubotella.com/csswg-auto-build is already using it.

Netlify is very similar to GitHub Pages, but also supports headers and redirects with a very easy declarative syntax, and I'm sure they'd be delighted to host this for free, since they host open source projects for free as well.

All of those requirements are separate from the requirement for users to be able to reliably read drafts.csswg.org specs.

But all of those requirements do exist, and they all need maintenance too.

I don't think anybody is suggesting those requirements don't exist. What people are arguing is they are lower priority, and have softer uptime needs than hosting of the drafts themselves.

That others have taken upon themselves to create parallel infrastructure for the hosting of drafts suggests that it is indeed a higher priority, whereas funding to deal with the broader set of needs of the CSS WG has been hard to come by.

Thus this further suggests that we can and should view the uptime requirements for these two things as separate concerns, and consider disentangling them.

Fixing the current infrastructure will result in a stable draft server far faster than this conversation is even likely to be concluded.

Frankly, this seems silly. AIUI, the things stopping us from moving the draft hosting this month are:

Needing to redo the YAML configuration files of @andreubotella's work to use Netlify to redirect drafts.csswg.org/api/ etc. to a new subdomain,
Getting WG consensus that we believe the new setup will be more reliable for those viewing specs.

While you might argue the former is just another case of "needing someone to do work", it is much easier for many of us to justify spending a few hours on a task than a week of work.

Rather than research the root problems, the knee-jerk response is to throw out what we have and build something new and shiny. Without any plan or even discussion of how that new thing will be maintained going forward.

I have much more faith in our collective ability to maintain a relatively small Python script and a few YAML files, and GitHub/Netlify to provide reliable hosting, than the current setup for hosting the drafts—even if the server is more reliable than it is today.

It’s occurred to me that this doesn’t need to be an either-or — we can, if we choose to, start auto-publishing all the CSS specs to https://w3c.github.io/csswg-drafts URLs in parallel with continuing to publish them to drafts.csswg.org URLs.

However, since the way I titled and framed this issue as “Replace the https://drafts.csswg.org/ backend”, it seems better to discuss the specific topic of parallel publishing to https://w3c.github.io/csswg-drafts URLs separately.

So I’ve opened https://github.com/w3c/csswg-drafts/issues/7712 for that.

And as noted in #7712, I've just adapted @andreubotella's work to do precisely such a thing - we now auto-build to https://w3c.github.io/csswg-drafts/, about five minutes after each push (it takes that long to build all the specs on the GH Actions runner machines).

As stated by several, this is not a replacement for the work @plinss is doing (in part, because we need their work to obtain the definitions that the auto-building relies on), but it does mean that we don't need to rely on arbitrary third parties to get this quickly-done mirroring.

Given that #7712 is resolved and we now have https://w3c.github.io/csswg-drafts/, I’m going ahead and closing this issue — because the existence of https://w3c.github.io/csswg-drafts/ solves the underlying problem this issue was raised to address.

w3c / csswg-drafts