ivoa-std / DataLink

DataLink standard (DAL)
3 stars 6 forks source link

Clarify semantics aspects #71

Closed msdemlei closed 2 years ago

msdemlei commented 2 years ago

Clarification of the meaning and use of semantics and content_qualifier.

This introduces a potentially document-breaking change, namely the requirement that datalink/core concept URIs must be relative (i.e., not include the URI). I think everyone has always done it like this, and making this guaranteed makes it a bit simpler to correctly deal with the semantics column (actually, both implementations I know that use values from the semantics columns already make that assumption).

This depends on the ivoatex update that comes with PR #70 for citations to resolve.

This, I claim, would solve Issue #67.

Bonnarel commented 2 years ago

Le 25/10/2021 à 11:28, msdemlei a écrit :

Clarification of the meaning and use of semantics and content_qualifier.

This introduces a potentially document-breaking change, namely the requirement that datalink/core concept URIs must be relative (i.e., not include the URI). I think everyone has always done it like this, and making this guaranteed makes it a bit simpler to correctly deal with the semantics column (actually, both implementations I know that use values from the semantics columns already make that assumption).

Well. During the last DAL running meeting we apparently had a consensus the content_qualifier will mandate to have full URIs . But you were not attending Markus.

That's why the initial text of the first PR #51 with content_qualifier was rewritten like in the recently merged master

Your new text is going in the other direction

See : https://wiki.ivoa.net/internal/IVOA/IvoaDAL_RunningMeetings/IVOA_DAL_RM12.txt

This depends on the ivoatex update that comes with PR #70 https://github.com/ivoa-std/DataLink/pull/70 for citations to resolve.

This, I claim, would solve Issue #67 https://github.com/ivoa-std/DataLink/issues/67.


    You can view, comment on, or merge this pull request online at:

https://github.com/ivoa-std/DataLink/pull/71 https://github.com/ivoa-std/DataLink/pull/71

    Commit Summary

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ivoa-std/DataLink/pull/71, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMP5LTBVMPYB5VYRIBNOKHDUIUPMHANCNFSM5GUZHUCA. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

msdemlei commented 2 years ago

On Mon, Oct 25, 2021 at 02:59:00AM -0700, Bonnarel wrote:

Le 25/10/2021 à 11:28, msdemlei a écrit :

it a bit simpler to correctly deal with the semantics column (actually, both implementations I know that use values from the semantics columns already make that assumption).

Well. During the last DAL running meeting we apparently had a consensus the content_qualifier will mandate to have full URIs . But you were not attending Markus.

That's why the initial text of the first PR #51 with content_qualifier was rewritten like in the recently merged master

Your new text is going in the other direction

See : https://wiki.ivoa.net/internal/IVOA/IvoaDAL_RunningMeetings/IVOA_DAL_RM12.txt

Hm. I wonder what Pat's concerns about "undesirable usage" were.

I have no strong opinion either way, but if I had been at that telecon, I'd have said:

(a) Well, it would be nicer if content_qualifier worked the same way as semantics; it's certainly a bit odd that vocabularies are used in two different ways in the same standard.

(b) Having a standard vocabulary increases the chances that people will actually do the right thing and take terms from it rather than just dump it random URIs that no client at all will understand (in which case it's not machine readable, which kind of defeats the purpose).

(c) Nobody wants to have long URIs when short words would do most of the time, which, I think, for content_qualifier is a reasonable expectation (though I admit I'm not sure what use cases for the long URIs are there).

(d) Comparing URIs (whose schemes and perhaps authority parts are supposed to be case-insensitive, with path parts and fragment identifiers quite certainly case-sensitive) is a huge pain. Let's spare normal clients that pain.

If the other authors say they've weighed those points and found them outweighed by whatever concern brought up the full URL thing, I'd still like the changes to semantics in; and the content_qualifier text could then probably be something like

Where applicable, concepts from the vocabulary http://www.ivoa.net/rdf/product-type should be chosen. In contrast to the semantics column, content_qualifier must always contain full concept URIs, regardless of whether URIs point into product-type or somewhere else.

As in the semantics case, non-IVOA concept URIs may be used. Again, they should resolve to human-readable definitions of the meaning and intended usage of the concept.

As an example, a light curve service might link to a spectrum of the object by using #counterpart in the semantics column and http://www.ivoa.net/rdf/product-type#spectrum in content_qualifier.

Is that preferable to the proponents of full URIs here? Given it's a bit odd to have two different recipes, I think it would be great if someone could donate a rationale for the difference (I can't write that because I don't see a good reason).

pdowler commented 2 years ago

IIRC, the "undesirable usage" was that if you can use bare product-type terms like "spectrum" and we allow terms from other vocabs, people might use bare terms from them as well, in which case it's just a column where you can put any one word.

I think content_qualifier is a little different than semantics: my understanding of using fully qualified URIs in semantics was that it was for a custom term (extension) but still in the same vocabulary (best example is our #thumbnail child of #preview -- extension of datalink/core rather than different vocab entirely). I don't recall off-hand how an rdf doc says it is an extension of another, but if that's possible I would expect any custom FQ term in semantics to be in a vocab that extends datalink/core. That's not true of content-qualifier

Anyway, looking at the current text now, it isn't as clear/explicit as I thought and the above from Markus looks fine to me, but I wonder if I am still reading something different into it. I think that content_qualifier could contain URIs from UAT or SimDM or whatever, not just standard product-type and custom extensions, because not all links are "to data products".

msdemlei commented 2 years ago

On Mon, Oct 25, 2021 at 09:08:04AM -0700, Patrick Dowler wrote:

IIRC, the "undesirable usage" was that if you can use bare product-type terms like "spectrum" and we allow terms from other vocabs, people might use bare terms from them as well, in which case it's just a column where you can put any one word.

While I'm sure people will put all kind of junk into the field as long as clients don't do anything sensible with it, I think the hash "marker" has worked quite well as an indicator that you're not supposed to put any old junk into semantics.

I think content_qualifier is a little different than semantics: my understanding of using fully qualified URIs in semantics was that it was for a custom term (extension) but still in the same vocabulary (best example is our #thumbnail child of #preview -- extension of datalink/core rather than different vocab entirely). I

No. RDF as such doesn't have much of a notion of a "vocabulary"; it just gives rules for interpreting triples of URIs, and is rather relaxed about how to group these URIs.

By giving rules for how RDF resource URIs in the VO schould look like, we in the VO have our specific idea of what "a" vocabulary is; it's basically all the concepts in one of our RDF/desise files. If you write some URI not starting with "the" vocabulary URI, the corresponding concept is not "in" that vocabulary.

But really, that distinction only has practical relevance only insofar as clients can be expected to do smart things with terms in the vocabulary (because they can easily retrieve label, description, and relationships for them), while for now they can't do that for anything else, whether or not these concepts are supposed to be related to concepts in the "core" vocabulary.

We could in Vocabularies 2.1 give rules for how people could host their own IVOA-compliant vocabularies and how clients should deal with them. But I didn't do that in 2.0 on purpose: It'll be hard enough to make clients pick up our Semantics tech without the vagaries of having to pull stuff from all over the web and having to deal with... loosely... curated semantic resources.

don't recall off-hand how an rdf doc says it is an extension of another, but if that's possible I would expect any custom FQ term in semantics to be in a vocab that extends datalink/core. That's not true of content-qualifier

Again, no, there is no formal or informal requirement that some custom concept you put into semantics has any relationship to something in datalink/core, and indeed there is no defined way to declare such relationships.

Anyway, looking at the current text now, it isn't as clear/explicit as I thought and the above from Markus looks fine to me, but I wonder if I am still reading something different into it. I think that content_qualifier could contain URIs from UAT or SimDM or whatever, not just standard product-type and custom extensions, because not all links are "to data products".

It certainly would help if we had a clear scenario for that, ideally of the form: "A datalink service operator wants to declare X on Y so that a client does Z. They therefore put the URI of concept X' from Vocabulary V into content_qualifier." Does such a thing exist somewhere? Does anyone perhaps even do that already?

I'd gladly amend the PR with text in that direction (also for my own sake, because so far I find all of that so cloudy that I wonder if I can consider the implementation requirement as satisfied for content_qualifier in its current shape) -- and it might even provide enough of a rationale for handling content_qualifier differently from semantics in case we really want to go back to the no-default-vocabulary text.

Bonnarel commented 2 years ago

On Mon, Oct 25, 2021 at 02:59:00AM -0700, Bonnarel wrote: Le 25/10/2021 à 11:28, msdemlei a écrit : > it a bit simpler to correctly deal with the semantics column (actually, > both implementations I know that use values from the semantics columns > already make that assumption). > Well. During the last DAL running meeting we apparently had a consensus the content_qualifier will mandate to have full URIs . But you were not attending Markus. That's why the initial text of the first PR #51 with content_qualifier was rewritten like in the recently merged master Your new text is going in the other direction See : https://wiki.ivoa.net/internal/IVOA/IvoaDAL_RunningMeetings/IVOA_DAL_RM12.txt Hm. I wonder what Pat's concerns about "undesirable usage" were. I have no strong opinion either way, but if I had been at that telecon, I'd have said: (a) Well, it would be nicer if content_qualifier worked the same way as semantics; it's certainly a bit odd that vocabularies are used in two different ways in the same standard. (b) Having a standard vocabulary increases the chances that people will actually do the right thing and take terms from it rather than just dump it random URIs that no client at all will understand (in which case it's not machine readable, which kind of defeats the purpose). (c) Nobody wants to have long URIs when short words would do most of the time, which, I think, for content_qualifier is a reasonable expectation (though I admit I'm not sure what use cases for the long URIs are there). (d) Comparing URIs (whose schemes and perhaps authority parts are supposed to be case-insensitive, with path parts and fragment identifiers quite certainly case-sensitive) is a huge pain. Let's spare normal clients that pain. If the other authors say they've weighed those points and found them outweighed by whatever concern brought up the full URL thing, I'd still like the changes to semantics in; and the content_qualifier text could then probably be something like Where applicable, concepts from the vocabulary http://www.ivoa.net/rdf/product-type should be chosen. In contrast to the semantics column, content_qualifier must always contain full concept URIs, regardless of whether URIs point into product-type or somewhere else. As in the semantics case, non-IVOA concept URIs may be used. Again, they should resolve to human-readable definitions of the meaning and intended usage of the concept. As an example, a light curve service might link to a spectrum of the object by using #counterpart in the semantics column and http://www.ivoa.net/rdf/product-type#spectrum in content_qualifier.

+1. I definitely prefer this version than the one in PR #71 and than the initial one I wrote

Is that preferable to the proponents of full URIs here? Given it's a bit odd to have two different recipes, I think it would be great if someone could donate a rationale for the difference (I can't write that because I don't see a good reason).

We don't want to "close the future" by giving a special rule in favor of data-product vocabulary. Imagine in the case of "semantics=documentation" we want to specify if it's simple free description, refereed paper, or conference proceedings paper. content_qualifier would be the right place to specify that I think. We may imagine having a standard vocabulary for "documents and papers" in the future.

msdemlei commented 2 years ago

On Tue, Oct 26, 2021 at 05:29:48AM -0700, Bonnarel wrote:

Again, they should resolve to human-readable definitions of the meaning and intended usage of the concept. As an example, a light curve service might link to a spectrum of the object by using

counterpart in the semantics column and

http://www.ivoa.net/rdf/product-type#spectrum in content_qualifier.

+1. I definitely prefer this version than the one in PR #71 and than the initial one I wrote

Is that preferable to the proponents of full URIs here? Given it's a bit odd to have two different recipes, I think it would be great if someone could donate a rationale for the difference (I can't write that because I don't see a good reason).

We don't want to "close the future" by giving a special rule in favor of data-product vocabulary.

Well -- we don't in either case, so that doesn't help the decision. In both cases, people can use arbitrary concept URIs.

The question at hand is: "Do we want to have two different ways of dealing with vocabularies in one standard because there is an overriding reason?" And my request was to try and figure out what the overriding reason back in the DAL running meeting was, because I'd prefer to explain these reasons if we do have them.

Imagine in the case of "semantics=documentation" we want to specify if it's simple free description, refereed paper, or conference proceedings paper. content_qualifier would be the right place to specify that I think. We may imagine having a standard vocabulary for "documents and papers" in the future.

Sure. But whether or not we define a standard vocabulary for the one clear use case now, people doing this later would be writing http://www.ivoa.net/rdf/documentation-type#refereed-paper (say). There's simply no difference to them.

The difference is for people who have "data products" -- for them, it's writing #spectrum vs. http://www.ivoa.net/rdf/product-type#spectrum. And it's perhaps with implementors who try to make something with content_qualifier and who with just #spectrum have a slightly simpler time (e.g., no headache as to whether or not a part of the string needs to be compared case-insensitively).

Which doesn't make a big difference, but I'd not want to make people write the noticibly more unwieldy full URIs and deal with the difference to semantics just because of some misunderstanding.

pdowler commented 2 years ago

hmmm. Since RDF has no notion of a vocabulary and therefore an extension, if I use http://www.opencadc.org/rdf/foo#bag in content_qualifier there is no implied sense that this is a custom product-type or a custom astronomical object type or anything else. It's just a word with a definition... by putting it into content_qualifier I'm saying "the thing at the end of this link is a bag".

Substitute http://ivoa.net/rdf/vospace#container for bag and it would be a real use case; also content_type text/xml would not convey enough information. Also, we could drop the RFE for VOTable to allow content param in the mimetype and just put #datalink into content_qualifier for recursive datalink.

The other aspect where short #term and full http://ivoa.net/rdf/{vocab}#term comes into play for me is the VEP process. I had been (in semantics) using FQ uris for prototype terms, but VEP requires that the term be demonstrated in use. That's manageable for me because the terms are in s/w, not (eg) in the database directly. But I wonder: if using a new term is as simple as create VEP && start using term (and be prepared to change use, of course) then that removes one use of FQ uris. How bad would it be if we said that any term in any ivoa vocab could be used in short form? That seems like it would cover > 98% of use cases. And I could see making a service to resolve #term to http://ivoa.net/rdf/{vocab}#term (which in principle would have to allow for multiple returns in some cases).

If this doesn't sound crazy, why not allow it? s/w will still only do things automatically if it recognises the #term.

msdemlei commented 2 years ago

On Wed, Oct 27, 2021 at 11:17:05AM -0700, Patrick Dowler wrote:

hmmm. Since RDF has no notion of a vocabulary and therefore an extension, if I use http://www.opencadc.org/rdf/foo#bag in content_qualifier there is no implied sense that this is a custom product-type or a custom astronomical object type or anything else.

Not by RDF itself, and not by current VocInVO. But that is, really, the reason why I suspect we're doing our client writers a favour if we say "get vocabulary X and try to interpret the terms that way, while being graceful when there's a full URI and hence the thing is not in X".

Only with that vocabulary can clients do all the magic of inserting labels and exploiting hierarchy at least for the well-known terms.

We can, if we really need it, expand this to "voabulary X and Y" (for very few vocabularies, because in consequence these must be checked for identifier clashes). Or we can say "also get vocabulary Y, but be aware that concepts from that will always come as full URIs" (which I'd recommend).

And of course there's some value in doing "custom contracts" between services and specialised clients using "singleton" concept URIs as in your vospace#container example -- but as long as we don't require clients to pull semantic resources from all over the net (and I'm sure we don't want that), once you put in arbitrary URIs, 90% of the magic is gone.

Substitute http://ivoa.net/rdf/vospace#container for bag and it would be a real use case; also content_type text/xml would not convey enough information. Also, we could drop the RFE for VOTable to allow content param in the mimetype and just put #datalink into content_qualifier for recursive datalink.

Hm... Do we do clients a favour if we do that? Suppose I have an object, and there's a spectrum and a time series attached to it, both of which are described through datalink documents. Wouldn't a client still want to know whether to send the link to a spectral or a time series client?

This would be different if we expected generic "datalink clients". But this is becoming so speculative that I'd suggest we ought to wait until someone actually wants to do anything like that. And why they want that.

The other aspect where short #term and full http://ivoa.net/rdf/{vocab}#term comes into play for me is the VEP process. I had been (in semantics) using FQ uris for prototype terms, but VEP requires that the term be demonstrated in use. That's manageable for me because the terms are in s/w, not (eg) in the database directly. But I wonder: if using a new term is as simple as create VEP && start using term (and be prepared to change use, of course) then that removes one use of FQ uris. How

Right. That was the intent.

bad would it be if we said that any term in any ivoa vocab could be used in short form? That seems like it would cover > 98% of use

No, that won't work. A client cannot be expected to pull all the vocabularies to figure out its label, descripion, and relationships, and I certainly don't want to require that different vocabularies cannot use the same identifier.

cases. And I could see making a service to resolve #term to http://ivoa.net/rdf/{v}#term (which in principle would have to allow for multiple returns in some cases).

...in which case a client is totally in the rain. Would it show all the labels? Guess which relationships to use? Also, that service would again require clients to access network resources while doing semantics, which I'm sure we want to avoid if at all possible.

Frankly: My impression is that this discussion is another instance of where we introduce a feature with the server side in mind, and as long as no client actually consumes the stuff, and there are hazy additional use cases in the air, it's really hard to pin down requirements and limitations. Which makes it really hard to know what will make the lives of future clients hard and what wouldn't.

Given that situation, I'd again say "let's concentrate on the use case we understand to a certain degree and make that work well".

That's the "find an appropriate SAMP client", and for that, it's reasonable to recommend to clients "Get product-type and work with it; but be aware that there can be other stuff in that field". It's kind of working for semantics, and I've not yet seen a reason in this discussion why it shouldn't work for content_qualifier.