Closed foolip closed 4 years ago
I've found that although http://lists.whatwg.org/ says "The list overview page has been disabled temporarily", the archive pages are accessible and I'd like to make some effort to save them: http://lists.whatwg.org/pipermail/commit-watchers-whatwg.org/ http://lists.whatwg.org/pipermail/help-whatwg.org/ http://lists.whatwg.org/pipermail/implementors-whatwg.org/ http://lists.whatwg.org/pipermail/whatwg-whatwg.org/ (partial because of some mishap in 2017)
Those are not accessible on HSTS-enabled browsers though, right?
They're not, but they appear to be the full archives of help and implementors lists, so scraping them like we did with the forums is probably low enough effort that it's worth saving.
I announced the proposed change on the list itself and pointed to this issue here: http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2019-December/000148.html https://lists.w3.org/Archives/Public/public-whatwg-archive/2019Dec/0000.html
Can anyone confirm they got that email, just to be sure?
I got the email.
I've been digging into what archives exist and what could be reconstructed. Current understanding:
With significant effort the original state of lists.whatwg.org can probably be reconstructed apart from September 2014 through July 2017. In that period https://lists.w3.org/Archives/Public/public-whatwg-archive/ is the only copy, so at best one could recreate a listing with other numbers that are plausible, but it wouldn't match the original URLs, whatever they were.
Doing some wget scraping now to figure out how complete the web.archive.org copy is.
Does anyone have the full archives of whatwg list locally? If I remember correctly, there was a header Archived-At
with the URL of that email in the archive.
Edit: I don't see such a header in the latest email. Maybe it's only for lists.w3.org emails?
I have all emails from that period locally in my Thunderbird. So I can export them make available to someone who is willing to recreate archive. But there is no Archive-At header.
@zcorpan the version of mailman and configuration probably changed a fair bit over time, so it's possible the headers changed. Or possibly you're thinking of the Message-ID header from which one could construct permalinks?
No it contained a URL. But it was probably only for W3C lists...
Yeah, I see Archived-At: <CAOOOkFcWW97r8yg=SsWg7GgCmp4suVX9o85y8BvNRqMjuc5PXg@mail.gmail.com>
as a header on the email I sent about closing the lists.
Looks like this might actually be useful in an unexpected way! Pulling a Message-ID from https://web.archive.org/web/20140706121731/http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2009-December.txt.gz and creating the URL https://www.w3.org/mid/op.u4kq0w0oidj3kv@zcorpandell.linkoping.osa works, I end up at https://lists.w3.org/Archives/Public/public-whatwg-archive/2009Dec/0103.html. With that I think a mapping can be created.
So clearly I've gone down the path of trying to keep whatwg@whatwg.org archive URLs from before prior to 2017 working where possible, or rather to revive them as they're currently 404.
Some notes:
Based on all this it should be possible to create redirect rules or a 404 page that checks all possible URLs in web.archive.org for a match.
However, since the pre-2017 archives aren't available on lists.whatwg.org, restoring them shouldn't block turning off the mailing lists, but would be nice to get done.
@foolip I'm curious about this:
https://lists.w3.org/Archives/Public/public-whatwg-archive/ has all of whatwg@whatwg.org but with different message IDs assigned, and a mapping is non-trivial
The message-ids in W3C's copy seem to be the originals, as far as I can tell.
I'm not sure if this would be useful (and maybe you already know) but W3C's archives are available in mbox format, e.g. https://lists.w3.org/Archives/Public/public-whatwg-archive/mboxes/ (restricted to W3C Members to limit bulk harvesting by spammers)
@gosko in that context by "message IDs" I mean the numbers in the URLs like in https://github.com/whatwg/meta/issues/153#issuecomment-566980200, not the long identifiers in Archived-At
headers as in https://github.com/whatwg/meta/issues/153#issuecomment-566787010.
I had noticed that the mbox archives are available, and think those could come in handy for reconstructing. It's just a lot of work to do this well and with confidence in the results given that you could at best sample the results to look for problems.
The email lists have been shut down now. lists.whatwg.org now has no DNS records, and I'm looking at bringing up a static copy of what I've scraped and what can still be scraped from web.archive.org.
lists.whatwg.org has been restored as well as it can be from web.archive.org now. If people can try using it and report issues that'd be great. Known issues:
I see there's one more issue. Looking for broken links from the monthly listings, I see that May and June of 2014 don't have the individual messages: https://lists.whatwg.org/pipermail/whatwg-whatwg.org/2014-May/thread.html https://lists.whatwg.org/pipermail/whatwg-whatwg.org/2014-June/thread.html
https://lists.whatwg.org/pipermail/whatwg-whatwg.org/2014-May/254200.html is the only message.
These may be possible to restore from "Gzip'd Text", but I'll just link to lists.w3.org for these two months, as I've already done for July 2014 through June 2017.
Edit: fixed
With https://github.com/whatwg/whatwg.org/pull/285 this has been resolved.
This is a tracking issue for the plan proposed in https://github.com/whatwg/misc-server/issues/75#issuecomment-561625392 to close the WHATWG mailing lists:
This issue is filed in case someone is following this repo but not the mailing list, and yet has feedback on the plan.
The pull requests that implement the suggestion are https://github.com/whatwg/whatwg.org/pull/269 and https://github.com/whatwg/misc-server/pull/120.