Support Robust Links - Githubissues

csarven commented 7 years ago

http://robustlinks.mementoweb.org/

URIs in dokieli would benefit from Robust Links for webarchive lookup of linked resources http://robustlinks.mementoweb.org

Raised by @hvdsomp (https://twitter.com/hvdsomp/status/818541180212445184)

Okay with the motivation, but not comfortable with the data- attribute approach. data- tends to be hidden from human view and relies at least on CSS to be made visible. The approach also relies at minimum on JavaScript to be interactable. Instead of:

<a href="http://www.w3.org/"
   data-versionurl="https://archive.today/r7cov"
   data-versiondate="2015-01-21">this decorated link to the W3C home page</a>

Use HTML+RDFa to be both human-visible and machine-friendly, along the lines of:

<a href="http://www.w3.org/">this decorated link to the W3C home page</a>
(
<a about="https://archive.today/r7cov" href="https://archive.today/r7cov"
   rel="prov:wasDerivedFrom" resource="http://www.w3.org/"
   property="rdfs:label">archive.today [
     <span about="https://archive.today/r7cov" property="prov:generatedAtTime"
           datatype="xsd:date">2015-01-21</span>]</a>
, ..)

The example code is certainly more verbose than the data- approach, so there is a trade-off. It however fits to the design principles of the dokieli project. The above HTML pattern can be progressively enhanced for the user through desired CSS (different stylesheets) and/or JS. It is entirely possible that a particular stylesheet or a particular media needs to hide from view, so the pattern should be sufficient enough to handle different scenarios. Same goes for the way it is interacted.

The exact HTML+RDFa pattern and vocabulary use needs actual experimenting. For example, the current approach in citations in dokieli is that it just has a relation from the article/fragment of text to what's being cited, i.e., information like "Accessed" and "Reason" are human-readable, but don't have extra structured markup.

This issue should coordinate with other issues on citations/references/footnotes UIs.

csarven commented 7 years ago

See also http://mementoweb.org/missing-link/ for more notes.

Brief notes re: http://mementoweb.org/missing-link/#option5

The approach is significantly less attractive when HTML is authored manually and when links in general are concerned.

Compared with the data- approach, I agree that it is definitely more verbose. Not sure about how to quantify "significantly less attractive" :) Ideally, we should focus on interfaces handling this work instead of being concerned about which type of web-developers are not going to find this attractive for "hand-coding".

Also, the approach reduces the degree of freedom regarding citation style: semantics can only be provided for information that is directly shown to the user.

That's a non-issue because a) if authored manually, desired citation style can be decided, b) if authored through a client UI, the editor for instance should take care of this work (via current or selected citation style).

hvdsomp commented 7 years ago

How does one know that the subject of "prov:derivedFrom" is a Memento of the object and hence can be visited to see what the linked resource was like when it was referenced? Agreed this problem can be addressed with RDFa. But it's not clear to me that the proposed approach is specific enough. This is in the first place about link/content persistence. It's not about expressing provenance of a process applied to the original resource. Such a process, as per derivedFrom, could be anything, not just archival.

csarven commented 7 years ago

I just happened to use that property to illustrate. If there is something more suitable (eg :mementoOf?) that can be used, we can do that. Whatever the semantics are. Is it like: x persistentCopy y or x robustLink y .. ?

As I mentioned in the first note, additional semantics may or may not need to be added, and that's partly a balance between whether the mentioned resource describes itself in a machine-readable way, and whether consumers of the document (which includes robust links) is interested or sufficient to mine the information from the document (without having to follow the link). This is the same scenario as the references/citations right now in dokieli. I can be convinced either way about having to include some information about the cited entity vs. just linking out (with an appropriate [maybe using a qualified?] relation). dokieli is defaulting to just linking out right now. Similarly, the machine-readable bits (like the date) for the robust link doesn't need to be re-stated per se.

ldibanyez commented 7 years ago

As I see it, in the decentralized scientific publication, your "research output" is live (or read/write if you prefer) until you decide to take a snapshot to be peer-reviewed. You continue on your live development until you decide to snapshot again and submit that to review (linked to the previous version), and so on. IMHO, it is for these snapshots that robust links are needed. If we archive in something that supports memento, then the semantics should "x robustlink y", if not, perhaps something like "x snapshot_of y"?

hvdsomp commented 7 years ago

On Mon, Jun 26, 2017 at 5:30 PM, Luis Daniel Ibáñez < notifications@github.com> wrote:

As I see it, in the decentralized scientific publication, your "research output" is live (or read/write if you prefer) until you decide to take a snapshot to be peer-reviewed. You continue on your live development until you decide to snapshot again and submit that to review (linked to the previous version), and so on. IMHO, it is for these snapshots that robust links are needed. If we archive in something that supports memento, then the

Agreed. You could make Robust Links HTML snippets available to link to specific versions of a "paper", which would convey version-specific URI, version datetime, generic URI.

But Robust Links can also be used in "papers", e.g. when referencing external resources that are being linked to, cited, etc. In this case, when the external link is put in place, a snapshot of the linked resource would be created in a web archive. And Robust Links HTML would go in the "paper" for the linked resource. It would contain the URI of the linked resource, the URI of the snapshot in a web archive, the datetime of linking.

Cheers

Herbert

semantics should "x robustlink y", if not, perhaps something like "x snapshot_of y"?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/linkeddata/dokieli/issues/168#issuecomment-311094551, or mute the thread https://github.com/notifications/unsubscribe-auth/ADojA2KnvrlU3Ln-AgeeizSXO30Dopz3ks5sH86ngaJpZM4Les8O .

-- Herbert Van de Sompel Digital Library Research & Prototyping Los Alamos National Laboratory, Research Library http://public.lanl.gov/herbertv/ http://orcid.org/0000-0002-0715-6126

==

ldibanyez commented 7 years ago

But Robust Links can also be used in "papers", e.g. when referencing external resources that are being linked to, cited, etc. In this case, when the external link is put in place, a snapshot of the linked resource would be created in a web archive

Very Interesting. Does this entails that every reader can create a "memento" of the live article at any moment?. Do archives scale to support that? I mean, in one approach the number of articles to archive depend on the author (and the document "dynamic"), while in the other it depends of the number of links to it (and the document "dynamic", as I suspect that a smart approach would check if the document have changed since last snapshot taken)

hvdsomp commented 7 years ago

On Jun 27, 2017, at 19:54, Luis Daniel Ibáñez notifications@github.com wrote:

But Robust Links can also be used in "papers", e.g. when referencing external resources that are being linked to, cited, etc. In this case, when the external link is put in place, a snapshot of the linked resource would be created in a web archive

Very Interesting. Does this entails that every reader can create a "memento" of the live article at any moment?. Do archives scale to support that? I mean, in one approach the number of articles to archive depend on the author (and the document "dynamic"), while in the other it depends of the number of links to it (and the document "dynamic", as I suspect that a smart approach would check if the document have changed since last snapshot taken)

Most web archives indeed have that kind of de duplication function: they will check if an identical representation of a resource (URI) has already been archived. If not, the representation and metadata about its observation (e.g. datetime) gets archived. If yes, the representation does not get archived a second time, but metadata about the observation does get archived (and linked to the previously observed identical representation).

Cheers

Herbert

Cheers

Herbert

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

linkeddata / dokieli

Support Robust Links #168