ivoa-std / DataLink

DataLink standard (DAL)
3 stars 6 forks source link

change the examples according to changes in service descriptors #34

Closed Bonnarel closed 4 years ago

Bonnarel commented 4 years ago

The examples now show the new features described in the pull request

msdemlei commented 4 years ago

Adding a name to datalink resources is perhaps a good idea, but "to help identifying the meaning" isn't telling implementors how they should pick names.

I guess the intent is that clients can show these names in UIs to let users choose between different datalink services on a single resource; in that case, advice on the typical length of such a name ("avoid names longer than 40 characters") will help sensible use. Now... to see where this is going, let's assume there's a datalink service ("{links}") and one of those unfortunate direct SODA blocks on a table.

The client knows which one the datalink service is because of the standard id, and it can tell the SODA block as well by the standard id as well. Hmmm... I start becoming uncertain if there should be a dialog that gives a choice between the two in the first place... Ok -- can we be a bit more concrete here what the names will be used for? If we can't, my suggestion would be to drop the name thing.

Dropping the name recommendation would also be supported if I look at the examples provided. Take "{links} for Obscore" as an example. First, it's a description, not a name, and second, it doesn't give any information that's not already there in other form (the links part is the standardId, the obscore part is in the metadata of the table this references). I can't really see how this would help anyone. Similarly, "My SIA service" sets a similarly unfortunate precendent of mixing the obvious with the useless.

For recommending descriptions on service descriptors, I see a lot more potential, but again, it would be good if the examples shown set a good precedent. "Links resources to datasets" is a zero information that just further confuses the already muddled terms "resource" and "dataset". Can't we rather say "This datalink service gives access to the raw data for the discovered datasets as well as to catalogues of extracted sources" or something of the kind? Operators stating why they're even offering a datalink service sounds like something that might help users.

"This is service SIA for so and so", on the other hand, isn't helpful, as it doesn't tell users anything they couldn't see from other info. Unfortunately, I can't see the purpose of the example as such, and so I can't propose a better explanation (and would rather suggest to drop the whole example).

"A nice container to share space" at least is somewhat witty, but in order to set a good precendent I'd rather say something like "Datasets discovered here are automatically available in Example Institution's VOSpace under the URI produced here" (though I again have to admit that I don't really understand why one would do this).

Instead of "Custom APOADIMO spectrum recalibration service" I'd say "This service lets you retrieve the spectra discovered uncalibrated, with flux calibration, and continuum normalised (where some spectra are now availble in continuum normalisation because the pipeline failed to identify a continuum)" or something like that (where the service this was taken from can't actually do that, but that's beside the point).

Bottom line here: let's nudge implementors away from useless and non-informative descriptions and towards descriptions that actually tell users things they didn't already know.

As to "RESOURCES these RESOURCES should be nested": I think this tries to solve a problem that doesn't exist -- clients need to pair up tables and service descriptors by following the reference(s) from the PARAM(s) in the service descriptor to the FIELDs and PARAMs in the table, and no nesting of RESOURCEs can help there. Also, if we are only talking should here, clients can't really optimise any parsing strategy based on any such rule. So: I claim this doesn't help anyone and is too vague anyway. Let's drop it.

Similarly, I can't see what the "self-describing service descriptor" example is supposed to accomplish and would suggest to drop it again. Why would one even want to find that "self description", let alone "unambiguous"ly? And once one has found it, what's the intended use of the information obtained in this way?

If we do something like this this "self description", it needs to be made clear what the standards status of this is: is it a MUST? Is having @name="this" a MUST in this kind of thing?

Summing up: The one thing I'd take over from the PR is the recommendation to have descriptions in service descriptors; but the examples for these descriptors ought to be changed so they nudge implementors in the right direction.

To the PR as it is: No, let's not merge it.

mbtaylor commented 4 years ago

@msdemlei,

To clarify the name/DESCRIPTION business: this came from a suggestion of mine in Victoria (page 5 of this presentation; see also DataLink-1_0-Next). The case I had in mind was when there are multiple service descriptors and the client wants to offer a choice of them to the user.

Consider the GUI in the screenshot here (which suggests I might have convinced you at some point in the past that this additional metadata was a reasonable idea). Without some of this semantic metadata, the user really doesn't in general know what they're going to get if they invoke this service for a given row (in this case, they get the epoch photometry table associated with the row's source). In the case of multiple sibling service descriptors it's even worse: if you're looking at a Gaia DR3 table there might be one service descriptor for retrieving per-source epoch photometry, and another for per-source RP spectrophotometry. If the service descriptors only have the kind of information shown in the DataLink 1.0 example, there's really no way the user can know which one, if any, she wants.

So the purpose of the name/DESCRIPTION metadata is to give the user enough information to have an idea what's being offered, and select a service accordingly. The name and DESCRIPTION are doing similar jobs, but one has a bit more space to say what it means than the other. This is useful in a user interface (e.g. name fits in a selector widget or can be used as an identifier in programmatic selection, but description is available in a tooltip or something if more detail is required).

With this in mind, the PR text could perhaps be amended to something like:

A short name attribute, and a more verbose DESCRIPTION subelement, MAY be added to the service descriptor RESOURCE to provide the user with information about the service's purpose or semantics. This SHOULD be done if the semantics are not obvious, and especially in the case of multiple sibling service descriptors.

Regarding the nesting of results/service RESOURCE pairs, that also came from a request I made in the implementation feedback referenced above. However, I think Markus is right here: this can be addressed by looking at ref/ID correspondance (TOPCAT doesn't manage to do that right now, but that's my problem); sorry I didn't spot that in the first place. So I agree this nesting recommendation should be removed.

msdemlei commented 4 years ago

On Fri, Jan 10, 2020 at 02:28:35AM -0800, Mark Taylor wrote:

Consider the GUI in the screenshot here (which suggests I might have convinced you at some point in the past that this additional metadata was a reasonable idea). Without

Yeah, I concede that having a label for a UI sounds like a good idea when there are multiple datalink services. As usual, the difficult part is giving guidance as to what @name should contain such that you can actually build useful UIs. This includes things like some order of magnitude for the length ("less than 20 characters", "one or two words", "a short phrase", "a paragraph", where ihere I think @name should be in the "one or two words" category, and description in "a paragraph"), but also, well, what to express in these words or paragraphs.

It is this "what to express" business for the examples given in the PR that made me doubt if @name can be sensibly defined and if perhaps we need to make to with description.

Perhaps you can try your hand on the @ñame attributes based on what you'd like to show in TOPCAT?

With this in mind, the PR text could perhaps be amended to something like:

A short name attribute, and a more verbose DESCRIPTION subelement, MAY be added to the service descriptor RESOURCE to provide the user with information about the service's purpose or semantics. This SHOULD be done if the semantics are not obvious, and especially in the case of multiple sibling service descriptors.

Sounds good to me, except as I said I'd like to see a bit more guidance as to length and ideally information to be conveyed, and setting good patterns for such names in the examples.

mbtaylor commented 4 years ago

From topcat's point of view, what I'd like to have is text for name that would fit on a button, and optionally one line of additional text for description. But I'm reluctant to be too prescriptive; people may have a lot to say in the description or may not, and other clients may want to use this information in different ways. Usage context also plays a role - it may be a good assumption that users of certain services have a good idea about what datalink services are provided, so that much in the way of description would be unnecessary or unhelpful. I could draft some text here, but I think it would add unnecessary verbosity to the content I've already suggested along with what people already understand by the terms "name" and "description".

Bonnarel commented 4 years ago

Again, this has been discussed a bit last week.

I agree that the examples must be reworked in the direction Markus states.

Name should be kept. Apart from Mark argument it is also interesting for "non standard" services.

Description : I think Mark gave the right argument

Nested resources : although the ref argument is tech,ically correct for computer recognition, I think the nested resource idea helps for human readibility. Could be kept as a should.

Autodescription : this actually was in the previous spec and has been rewritten in another PR already accepted and merged into the main repository. Useful for ad hoc services and outside links discovery context.

Bonnarel commented 4 years ago

All the service descriptor section changes in this PR have been revisited according to recent discussion. Mark Taylor has been added as an author (this accepted changed disappeared in the ivoa-stan DataLink master Makefile has been modified For "self-described service" the valid text (already accepted) is in the master