What is a URI Persistence Policy?

karlcow commented 13 years ago

Stolen from the BBC's Nature Site Team, their development manifesto contains

Persistence — only mint a new URIs if one doesn’t already exist: once minted, never delete it

A URI persistence policy is a statement of trust for the user of your Web site as much as an engagement for designing your infrastructure. It creates a design constraint that will help thinking about the value of each URI you create and how you manage its future. Think about the ruin your web site will eventually become.

An example of URI persistence policy can also be found on W3C Web site.

olivierthereaux commented 13 years ago

One important part of a URI persistence (persistency) policy should be about the use of the 410 Gone HTTP header.

Assuming that the individual(s) (IA, developer, team) in charge of the URI space have access at all to the 410 Gone, there does not seem to be any agreement - at least according to web discussions I have witnessed - on whether the 410 header should be used, and if so, for how long?

For as long as the resource may have been cached (related to caching policy advertised by the web server)
For as long as any link to the resource remains (a.k.a pretty much indefinitely)
indefinitely, regardless of context

This raises the question of how the 410 (and any other redirect, etc) are managed.

As anecdote, my personal site has a set of redirection directives in its configuration file which have been active for almost 10 years - arguably longer than any cache or old page may have linked to or stored the resources in their "old" URI, but I can keep the redirects available indefinitely because of the easy persistence of the configuration file (flat file, as opposed to other, more transient, storage).

olivierthereaux commented 13 years ago

Another major part of the URI persistence policy concerns redirects. Questions similar to the ones in comment above apply:

Who has the right to redefine/redirect a URI
What are the acceptable technical means to advertise the change (Javascript, HTML, HTTP, prose in the resource)
How long should any technical mean to advertise the change should stay live
- as a corrolary, is there a reliable way to know whether a URI is linked to from anywhere, other than
- keeping years of logs and seeing if any agent tried to dereference it
- crawling the open web for links to said resource

A side question: once a resource has been given a URI and that URI advertised, other resources or applications on the web may be referencing that URI. Should there be a differentiation between links to a given URI from the open web (thus, in theory crawlable) and links from intranets and other walled gardens? Should a link from the non-open web be given the same importance as links from the open web?

karlcow / webarch

What is a URI Persistence Policy? #5