w3c / web-annotation

Web Annotation Working Group repository, see README for links to specs
https://w3c.github.io/web-annotation/
Other
141 stars 30 forks source link

Web resources SHOULD be dereferencable via their IRI #372

Closed gsergiu closed 7 years ago

gsergiu commented 7 years ago

Current definition of Web Resources in WD: http://w3c.github.io/web-annotation/model/wd2/#terminology Web Resource A Resource that MUST be identified by an IRI, as described in the Web Architecture [webarch]. Web Resources MAY be dereferencable via their IRI.

In relationship to External Web Resources, the general Specification of Web Resources looks like being related to the Internal Web Resources (i.e. Annotation IRI, AnnotationPage IRI, etc. ) I suggest changing the MAY be dereferencable to SHOULD be deferencable

iherman commented 7 years ago

I actually do not agree. Not trying to start a religious discussion, but I think it is perfectly fine if an annotation system uses, for example, URN-s for all internal resources that are never ever really referenced through Web protocols. This is of course possible with a SHOULD as well, but it makes implementations feel guilty for no good reasons...

I propose to leave it as is.

gsergiu commented 7 years ago

First: Well, I'm not sure if I understood the inteded difference between the Web Resource and External WebResource. But ... at it look for me Web Resource are Internal Web Resources + External WebResources. For the later we have the constraint to be dereferencable. It looks at least strange ... to impose a higher constraint to external resources than for internal resources (which are: Annotation IDs, AnnotationPage IDs, AnnotationCollection IDs).

Second: I'm not sure if the argument with the URN holds in this context. Dereferencable doesn't mean that it is accessible using the exact given URL/URN. The implementations can have a online resolve service that is able to locate and provide access to the resource identified by the URN. By the way, which kind of Internal Web Resources are in question of not being dereferencable? An probalby the more important question is why souldn't they be (even directly) dereferencable.

iherman commented 7 years ago

Yes, an implementation MAY have a resolving service do dereference a URN, but why would we have to say it SHOULD (because that is what you propose translates to). I can imagine an implementation that stores annotations locally (not necessarily in a server, ie, not necessarily accessible via the protocol), and would use the RDF Graph model to store the annotations, hence use URN for unique identifications, but all that done strictly internally. Such an implementation has no interest in using URL-s or resolvable URN-s, and it should be still valid. MAY allows for that. Strictly speaking, so does SHOULD, but that latter is stronger. I do not see why we would have to modify that.

(FWIW, what you describe as "First" in your comment coincides with my understanding.)

azaroth42 commented 7 years ago

Non information resources fit under "web resource" but not "external web resource" as they do not have representations.

We also have no way of enforcing SHOULDs or MUSTs for terminology. If anything, we should remove the RFC2119 styling from the MAYs in the terminology section.

gsergiu commented 7 years ago

@iherman well, what you make accessible only in intranet, I will not call it as being a web resource. A database, or an information that is accessible only in Intranet is not a Web Resource, is an internal resource only. They might be similar to Web Resources and use the same technologies ... that's why I don't recomment MUST, but I consider that the main stream of implementations MUST use dereferencable resources. But ... some particular implementations MAY not do it, but these are likely to be the exceptions and not the "main stream" for implementation. Therefore I find SHOULD as being the more correct message to be sent to implementors.

akuckartz commented 7 years ago

An explanation can be added to a SHOULD to reduce the guilt !

gsergiu commented 7 years ago

@azaroth42 the RFC2119 makes clear that SHOULD represents a recommendation, and MAY an optional requirement. This is exactly my point, to we want to say that is optional for web resources to be deferencable, or ... is it recommended to be dereferencable?

I don't understand what you mean by "Non information resources", can you provide a reference in the specifications where such resources are used?

azaroth42 commented 7 years ago

LMGTFY:

gsergiu commented 7 years ago

@azaroth42 Thanks for the resources. I include here the recommended good practices from the W3C webpage:

Good Practice

_Authorities MAY create HTTP URIs for non-information resources in addition to those for information resources.

If a URI identifies an information resource, the URI owner SHOULD provide representations of that resource. This is based on the available representation practice 3.5 in [AWWW]

If a URI identifies a non-information resource, the URI owner SHOULD provide an associated information resource which, when dereferenced, provides additional information about the original resource. In addition, the URI owner SHOULD make the URI of an associated information resource available using the mechanism based on returning an HTTP response code of 303 to the original request._

  1. By reading this text, I understand that the recommendation is that also "non-information resources" SHOULD be dereferencable, even if they are not "directly" dereferencable using their URIs. I don't find an inconsistency with my change request.
  2. I think that in the Annotations we mainly talk about the informational web resources, which actually MUST be dereferancable. Do we use non-informational resources in any of the provided examples? Which is the current ratio of "non-informational" vs "informational" resources in the WA examples?
iherman commented 7 years ago

@azaroth42 https://github.com/azaroth42 Thanks for the resources. I include here the recommended good practices from the W3C webpage:

Good Practice

_Authorities MAY create HTTP URIs for non-information resources in addition to those for information resources.

If a URI identifies an information resource, the URI owner SHOULD provide representations of that resource. This is based on the available representation practice 3.5 in [AWWW]

If a URI identifies a non-information resource, the URI owner SHOULD provide an associated information resource which, when dereferenced, provides additional information about the original resource. In addition, the URI owner SHOULD make the URI of an associated information resource available using the mechanism based on returning an HTTP response code of 303 to the original request._

By reading this text, I understand that the recommendation is that also "non-information resources" SHOULD be dereferencable,

even if they are not "directly" dereferencable using their URIs. I don't find an inconsistency with my change request.

That is not the way I read it. Authorities MAY create a HTTP URI; if they do, then there should be an additional mechanism. But the operative term is 'MAY'.

The practice of all this may also be very complicated. Just as an example (it is only an analogy, does not apply directly to this case, but shows the complexities involved) you may have fun reading this document:

https://www.w3.org/TR/swbp-vocab-pub/

(The HTTPRange-14 discussion that Rob referred to is also similar.)

This whole issue of HTTPRange-14 has been on the SW agenda for long, and the fact of the matter is that application developers often ignored all this. Using a MAY in our text is a pragmatic acknowledgement of the difficulties involved in imposing an ideal practice.

My vote is to leave things as they are.

gsergiu commented 7 years ago

Hi @iherman,

I think that the text we have at hand, being a W3C recommendation is perfect to assess this issue.

  1. I would like to indicate that your interpretation of the text, refers to the existance of Http URIs and not to their dereferenciability.

That is not the way I read it. Authorities MAY create a HTTP URI; if they do, then there should be an additional mechanism. But the operative term is 'MAY'.

However, in the definition of the WebResource, in the WA it is already imposed that the Resources MUST have IRIs. Meaning that for WA it is a must that all Resources have IRIs (code>@id</code)

  1. The issue that I raise is that, in the case when we have the Http URIs for the resources (IN WA we do have IRIs for all resources), than by using these URIs it SHOULD be possible to dereferenciate the resources (directly or indirectly).

And I think this is perfectly inline with the W3C recommendation:

In addition, the URI owner SHOULD make the URI of an associated information resource available using the mechanism based on returning an HTTP response code of 303 to the original request._

  1. From logical point of view, I find it quite strange to claim that we are developing a standard for web annotations, that are using Web Resources, but it is optional to be able to access these resources. In other words, ... do we find it perfectly OK, that when the user accesses the target URL, to get an 404 Response? (I don't find it ok)
iherman commented 7 years ago

So... you seem to modify your original proposal. In that one you just said:

Web Resource A Resource that MUST be identified by an IRI, as described in the Web Architecture [webarch]. Web Resources MAY be dereferencable via their IRI.

In relationship to External Web Resources, the general Specification of Web Resources looks like being related to the Internal Web Resources (i.e. Annotation IRI, AnnotationPage IRI, etc. ) I suggest changing the MAY be dereferencable to SHOULD be deferencable

But you seem now to add an additional constraint, namely that if the IRI is an HTTP URI then it SHOULD be dereferencable. Which of course it is fine, because this is what webarch and the other documents say but, then, it seems to be superfluous to repeat in the WA Recommendation a statement that is already stated by other, authoritative groups. And, b.t.w., there is already a reference to webarch in the text (note, not a reference to the definition of an IRI, but to webarch, which addresses these types of issues!).

Bottom line: I still do not see why the current text should be modified. Also, as @azaroth42 said, adding a strong requirement that currently no tested, and could be tested only with complex means, is also not a good idea.

gsergiu commented 7 years ago

@iherman @azaroth42 Yes, my assumption was that all IRIs are HTTP URIs. Probably that is why we have a different understanding of the texts.

However, there exists no instance of "id" in the 44 examples in the WA Specificaitions in which the IRI is not a HTTP URI.

I asked the previous question to @azaroth42

I don't understand what you mean by "Non information resources", can you provide a reference in the specifications where such resources are used?

But it seems that my question was not interpreted in the way I expected, so ... I ask again in a more explicit manner:

  1. Are there any IRIs in the Web Annotation specifications that are not Http URIs? (I cannot find such cases in the 44 provided examples)
  2. Are there any "non informational" resources used in the Web Annotion Specifications? (I'm not able to find the informational keyword in any of the documents)
gsergiu commented 7 years ago

PS: it seems to me that you are keen of using MAY instead of SHOULD becasue of some hypotethical situations that you discussed previously, but there is no clue in the specifications about what was discussed offline.

I think that the may purpose of the specifications is to be understood by the ones that read it and implement it. Ideally the ones that read it and the ones that wrote the specifications should have the same understanding of the text. If this is not the case ... I consider that we have a valid "editorial issue"

As conclusion I would express explicitly my claim that the definition of the Web Resource in WA standard: Web Resource MUST have IRI, IRIs MAY be dereferencable

Is different that the W3C recommendation on Dereferenciation:

Authorities MAY create HTTP URIs for non-information resources in addition to those for information resources. If a URI identifies an information resource, the URI owner SHOULD provide representations of that resource. If a URI identifies a non-information resource, the URI owner SHOULD provide an associated information resource which, when dereferenced, provides additional information about the original resource.

With the mention that it is not clear in the text and examples of the Web Annotation if all of the IRIs are Http URIs. Still ... this looks like being the main stream, and according to my understanding the main strem should be the one used for deciding if the derefenrecability is RECOMMENDED (=SHOULD), or OPTIONAL (=MAY)

gsergiu commented 7 years ago

@iherman

Which of course it is fine, because this is what webarch and the other documents say but, then, it seems to be superfluous to repeat in the WA Recommendation a statement that is already stated by other, authoritative groups.

I do agree that it would be redundant, in the case that WA Recommendation and the Deferencability Recommendations would be consistently used. However, according to my level of understanding, inthe current situation it is not the case.

iherman commented 7 years ago

@iherman https://github.com/iherman @azaroth42 https://github.com/azaroth42 Yes, my assumption was that all IRIs are HTTP URIs. Probably that is why we have a different understanding of the texts.

However, there exists no instance of "id" in the 44 examples in the WA Specificaitions in which the IRI is not a HTTP URI.

I asked the previous question to @azaroth42 https://github.com/azaroth42 I don't understand what you mean by "Non information resources", can you provide a reference in the specifications where such resources are used?

But it seems that my question was not interpreted in the way I expected, so ... I ask again in a more explicit manner:

Are there any IRIs in the Web Annotation specifications that are not Http URIs? (I cannot find such cases in the 44 provided examples) None of the IRI-s defined in the WA spec are necessarily HTTP URIs. The fact that there are no such cases in the examples is not really relevant. Are there any "non informational" resources used in the Web Annotion Specifications? (I'm not able to find the informational keyword in any of the documents) And I do not see any reason why we should define that in the spec. It is actually a nicety that I would not want to see in the Model document; we made a lot of effort to make the document palatable to lambda Web Developers, and that would work against that.

I already gave this example: it is perfectly conceivable that a model implementation (akin to Hypothes.is, for example), uses the RDF model in its implementation, hence would use IRI-s in the relevant triples, but would not expose the annotation structures directly and hence would use URN-s, as a matter of convenience, as IRI-s for the annotation.

iherman commented 7 years ago

On 3 Nov 2016, at 09:57, gsergiu notifications@github.com wrote:

@iherman https://github.com/iherman Which of course it is fine, because this is what webarch and the other documents say but, then, it seems to be superfluous to repeat in the WA Recommendation a statement that is already stated by other, authoritative groups.

I do agree that it would be redundant, in the case that WA Recommendation and the Other Recommendations would be consistent. However, according to my level of understanding, inthe current situation it is not the case.

Why? There is a reference to webarch and we would just repeat what is there.

gsergiu commented 7 years ago

@iherman

but would not expose the annotation structures directly and hence would use URN-s, as a matter of convenience, as IRI-s for the annotation.

According to the dererencability recommendations, the URNs should be dereferencable as well, even if they are not "directly" deferencable. There should be some indications how to dereference URNs. Or at least this is how I understand the Deferencability recommendations.

I'm not familiar with the particular implementation from Hypothesis .. but I think that their URNs are dereferenced by client, in order to present a complete representation to the users, and consequently I belive that their URNs are dereferencable, even if not directly, but through an "resolve" service.

but would not expose the annotation structures directly and hence would use URN-s, as a matter of convenience, as IRI-s for the annotation.

gsergiu commented 7 years ago

PS: I dissapointed to here that the examples are not relevant .... I think that the examples are extremely useful for implementation.

None of the IRI-s defined in the WA spec are necessarily HTTP URIs. The fact that there are no such cases in the examples is not really relevant.

From my point of view this is an inconsistency in the specification, that deserves to be addressed in any of the possible way:

Any of the solutions are ok for me, as long ... as the issue is addressed

iherman commented 7 years ago

On 3 Nov 2016, at 10:29, gsergiu notifications@github.com wrote:

@iherman https://github.com/iherman but would not expose the annotation structures directly and hence would use URN-s, as a matter of convenience, as IRI-s for the annotation.

According to the dererencability recommendations, the URNs should be dereferencable as well, even if they are not "directly" deferencable. There should be some indications how to dereference URNs. Or at least this is how I understand the Deferencability recommendations.

That is not my reading, see my earlier comment. And regardless… nobody would use it for the example I gave. It is used to handle, say, ISBN URN-s, DOI-s, and stuff like that, which are non HTTP URI-s used for public identification of things. The example I gave is very different. I do not see absolutely no reason for us to use a SHOULD when we cannot enforce it and, actually, I would not do it myself either if I implemented following the model I gave.

We get to the point when, I believe, we have to agree that we disagree...

gsergiu commented 7 years ago

Technically there is no more difference between SHOUL and MAY, but the understanding of SHOULD is that it is RECOMMENDED to do so, while MAY, means it is OPTIONAL to do so.

I think that there is enough information in this thread to take an informed decision on this matter. I would appreciate if the decision will be taken by the majority voting in a WA meeting and not by one of editors vetoings on it.

BigBlueHat commented 7 years ago

@gsergiu you mention that "adding an example with non dereferencable IRIs (i.e. URNs)" is sufficient to close this issue. See: https://www.w3.org/TR/annotation-model/#example-15

The section which contains that issue goes into more detail of the specific use case, and explains one of the many scenarios in which non-dereferencable IRIs valuable: https://www.w3.org/TR/annotation-model/#h-other-identities

The group MUST not change the text to a SHOULD because it MAY NOT be appropriate to the wider, off-Web world of annotation. 📚

gsergiu commented 7 years ago

@BigBlueHat ok ... I see it now. Thank you for the reference. I assume you mean:

"canonical": "urn:uuid:dbfb1861-0ecf-41ad-be94-a584e5c4f1df"

I would just want to emphasize that I was talking about the @id of the Web Resources, which is a MUST for web resources.

According to my understanding, the Web Reources must have an @id that I would like to be deferencable (all), and might have an alternative cannonical URN, which is optional.

If the definition of the Web Resources would say that the resources must have an IRI (@id) or a cannonical urn I would agree that your argument is enough to close the ticket. But according to my understanding this is not the case. My judgement might be correct or wrong, I'll let you to decide on which side of the truth my affirmation lays.

BigBlueHat commented 7 years ago

@gsergiu how would you create an annotation for a physical book?

gsergiu commented 7 years ago

@BigBlueHat depends on how you would represent the physical book as a Web Resource

BigBlueHat commented 7 years ago

@gsergiu except that's not a requirement that should be added to annotation as a human experience.

Web Annotation is simple a spec for creating annotations on the Web. It does not (nor should it) dictate what you can talk about nor what prerequisite work you might be required to do before you can say something.

Consequently, this is something we want to allow for:

{
  "target": "urn:isbn:1234...",
  "body": {
    "value": "Best book ever!"
  }
}

That is possible now--by design. It's not something the group would ever consider changing (afaik).

gsergiu commented 7 years ago

@BigBlueHat this ticket relates to the definition of Web Resources. "target": "urn:isbn:1234..." Is the target from the upper example a Web Resource or not? For me it is not, but if you consider that it is a Web Resource, that means that you have a way to access that Web Resource using the URN, which will mean (according to my understanding), that the given URN is dereferencable.

gsergiu commented 7 years ago

If you consider that I'm the only person that has this understanding, it is fine for me to close the ticket..

azaroth42 commented 7 years ago

Thanks! Closing.

gsergiu commented 7 years ago

Just for reference, there are definitions for Resources and Web Resources. The (non Web) Resources are not dereferencable, while the Web Resources typically are/might be dereferencable:

Resource An item of interest that MAY be identified by an IRI. Web Resource A Resource that MUST be identified by an IRI, as described in the Web Architecture [webarch]. Web Resources MAY be dereferencable via their IRI.