rfcseries-wg / new-topics

3 stars 1 forks source link

Archive external URLs mentioned in RFCs #29

Closed alexisannerossi closed 1 year ago

alexisannerossi commented 1 year ago

I've noticed that RFCs can have links to outside material (e.g. in references). If these materials are available on the web, we should consider archiving them when the RFC is published. This would help us preserve the integrity of the archival document, and hopefully assist people in the future with understanding the content when the live URLs inevitably change or disappear.

I think there are three basic options here:

A quick overview of some services available for this kind of work. There are other, more commercial options as well, these are the ones I'm very familiar with and that seem to have the longest life span. If we chose the hard option, I can also provide a run down of some of the options available.

Wayback Machine - Internet Archive

Perma.cc - Harvard Law School Library

Archive-It - Internet Archive

mnot commented 1 year ago

I suspect we'll find that some RFCs use URLs to intentionally refer to something that might change, with the most recent being authoritative -- for example, a registry. As such they're one mechanism that allows authors to work around the issues that come up when we have a strict policy of not allowing changes to published documents.

In an ideal world, authors would be explicit about the nature of the resources they're pointing to. Luckily, often it can be inferred, but that's not something that would be easy to automate.

All of this is a long-winded way to say: we should explicitly state that the archived materials from URLs are informational about what was at that link at the time of archiving -- updates to the lined resource may or may not be authoritative, depending on the intent of the authors.

alexisannerossi commented 1 year ago

For the type of URLs that are intended to change (per your example), it is possible to have a separate collection of things that get crawled regularly (weekly, monthly, yearly) so that you can have some record of how that reference changed over time. And some hope of having a relatively recent version of it archived when the page inevitably dies. I know Archive-it.org has this kind of feature built in, the only issue is knowing which kind of URL is which.

alexisannerossi commented 1 year ago

We are currently archiving these outlinks via Archive-It. I think this issue is closed (though there will still be discussion about how to actually fix broken links in published RFCs, i believe that is a separate issue).