ivoa-std / DataLink

DataLink standard (DAL)
3 stars 6 forks source link

Separate out "Service Descriptor" link concerns #53

Open pahjbo opened 3 years ago

pahjbo commented 3 years ago

I think that it would make sense to factor out all of the service descriptor concerns into a separate document, and indeed a separate response from a datalink server, which would also separate the concerns for the client. The original datalink response could then be effectively just a list of URLs, and then the "service descriptor" link (specified as such by the response semantics) could be a plain link to retrieve a full service description written for the sake of argument in PDL 2.0, or is can be something entirely new - Service Description Language, which would additionally specify how to map the parameters onto the service call. This would not be constrained by what can be done in the existing VOTable schema and so can be rich enough to allow the client to be able to build a suitable GUI.

Bonnarel commented 3 years ago

The idea to separate the service descriptor from the DataLink response table. This already exists as "Service self description". The example has been totally rewritten in the new version (already merged). and if semantics is "proc", calling it (empty query) directly from the link in the DataLink response should work and not break the standard

Second point in your message : use another decription language, PDL 2 or whatever instead of the VOTable service descriptor syntax. Humm , if your service send a description like that for an empty or uncomplete query then you can send back such a description. But then clients should interpret it. And we have to standardize this. Could you prototype something and show us ?

pahjbo commented 3 years ago

I think that my reasons for wanting this are rather more fundamental than arguing about the exact mechanisms of which elements/attributes are used to represent things.

I have two main reasons for wanting the service specification to be separate

  1. modularity - having a service specification is useful in other places e.g. workflows (which was the main use-case behind PDL). A separate service specification would help in re-use, and allows it to be optimized to the purpose of service specification. It has already been noted that the "streamability" is desirable - well squashing everything into one votable does not help with this - having links to separate service descriptions does help with streaming and parallelisation especially with cases where there are multiple datasets returned.

  2. over-use of VOTable - Although I know that the idea of using parts of the VOTable metadata for GUI description has been present as Appendix A2 of the VOTable standard for a long time, I do not think that it is a great idea. Fundamentally VOTable it is a table model for a set of related tables, so it should really be interpreted as such. When dealing with data model and software design I am keen on the https://en.wikipedia.org/wiki/Is-a approach and if VOTable is a good design for relational table model then it is unlikely to be an optimal design for GUI Description because GUI descriptions are not a kind of table.

Further examination of the use-cases for "Service Links" (numbered below as in the specification) suggests that many of the patterns can be simplified/implicit and that only the full "non-standard" service description is a good use-case.

4.2 {links} capability - this is implicit in the fact that it is a DataLink service - in fact this is really the killer feature of DataLink - globally unique ids for datasets - once I have used a discovery service to find a dataset, then I want to be able to find it again with its identifier, rather than having to "discover" it again. Tying the various services together by strengthening the guarantees on the dataset identifiers as you have already discussed in

https://github.com/ivoa-std/DataLink/pull/37#issuecomment-584609226

seems like a very good idea to me.

4.3 standard service - just give the ivorn of the registry entry - all usual rules apply from the registry and its definition - though if necessary it could be a url to the service, and VOSI rules apply. In general it is pretty difficult to specify a S*AP query that will return just the metadata of the particular dataset without it being one with only ID specified.

As an optional enhancement to DataLink it might be as useful to add this metadata directly into the response in separate tables, but again this is not necessarily helpful for streaming and parallelism.

4.4 vospace reference - I think that this use case bad practice and should not be done as it subverts the intention of vospace - if referencing something in vospace then the access_url should just point to the vos: URI for the reference to have any guaranteed long term applicability.

4.5 Custom access data service - in my opinion this is the only really valid use case (with 4.6 just being a special case of such, unless I am misunderstanding the distinction), and as I have argued above I think that the service description deserves a more comprehensive and extensible language than what can be squeezed into the current VOTable elements.

I feel that there is something simpler and yet more powerful struggling to break through in the DataLink specification, and it seems to me that the "service specification" part is getting in the way of finding this simplicity.

pdowler commented 1 year ago

I agree that the current DataLink-1.1 (draft) contains two largely independent things: links and descriptors. So yes, in principle they can be split into separate specs and that's probably a good idea so they can evolve at different speed.

I also agree that service descriptors are severely constrained by VOTable, but the trade-off there is that one can embed a relatively simple service descriptor in any (not just links) tabular result and enable calling some service one or more times for one/some/all of the rows in the table. To be fair, if you compare embedding the descriptor vs providing a link to a descriptor (which opens up other SDLs) you might only save one call or you might be saving one call per row in the table (if the SDL is expected to deliver row-specific metadata to help construct the UI and/or service call.... current embedded descriptors can do that, in principle).

A plausible plan forward would be to finish the "bug-fix and clarify" DataLink-1.1, then split the docs and begin work on WD-DataLink-1.2 and WD-ServiceDescriptor-1.2... splitting would make it easier to pursue the latter.