ivoa-std / UWS

Universal Worker Service
Creative Commons Attribution Share Alike 4.0 International
1 stars 1 forks source link

Allow, or require, UWS job records to identify an IVOA "standardID" for the originating service #2

Open gpdf opened 1 year ago

gpdf commented 1 year ago

In the Rubin Science Platform (RSP), we are building a general capability to monitor UWS jobs. At any given time, a user may have a number of outstanding jobs of various types, originated through various RSP services. These might include TAP, where our Qserv back end is designed to support long-running full-table scans through the full Rubin catalog data, and where it is expected that large jobs may run for many hours; bulk cutout services; forced-photometry-on-demand services; and so on.

The same considerations apply in the interfaces being designed by IRSA for the archives for upcoming missions.

From a perspective of separation of concerns, as well as the "multi-Aspect" design of the RSP, it is pretty much essential for us to decouple the submission of jobs from the monitoring of jobs and the display of their status and results. Roughly speaking, this means that it would be very useful to be able to submit an asynchronous TAP query from one component, obtain the UWS "job resource" URL for that job (UWS v1.1 sections 2.2.1 and 2.2.2.2), and then, in a different component of the RSP, start purely from that URL and be able to understand that it represents a TAP query.

The natural solution to this would seem to be to allow the inclusion in the XML schema of a UWS Job (section 2.2.2.2) an IVOA standardID, with the same interpretation that this has in DataLink (PR v1.1, section 4.2), DALI (WD v1.2, section 5.4.3), and StandardsRegExt (REC v1.0).

Note that while DALI recommends that a standardID appear as an <INFO> element in a VOTable result, in general a "generic UWS monitor" cannot rely even on the existence of a VOTable somewhere in the job's results, or on under what name it might appear. Thus the DALI standardID does not meet the need described here.

I would strongly recommend that the <job> schema be extended to include something like a

<uws:standard standardID="ivo://ivoa.net/std/TAP" version="1.1">

element (whether version is useful can be debated). I would even suggest that future versions of UWS require its use when a service claiming to conform to a standard creates a UWS job.

We would have to talk about how to introduce it incrementally.

gpdf commented 1 year ago

@Zarquan posted an alternative suggestion on #3 that covers providing the standardID.

mbtaylor commented 1 year ago

I don't know how much this suggestion is about specific requirements for Rubin/Firefly interaction, and how much about a general problem that you see emerging. But the requirement can be met in a non-standard (but standards-compliant) way in cases where you control both the client and server implementations by adding custom elements to the job/jobInfo element, which is allowed to contain arbitrary content. Even if these suggestions do end up in a UWS 1.2 at some point, doing it like that in the first instance would (a) allow you to solve your immediate problem in a standards-compliant way without having to wait on the standardisation process, and (b) provide a useful prototyping function to check that these features are the right way to address your issues, thus providing useful input to future standardisation activity.

gpdf commented 1 year ago

I think it's pretty generally relevant, since in most cases, across all the standards, we have been striving to have data be self-describing. I'd just been assuming that this would be the case with UWS results, without having looked closely at it in advance, so when we actually got around to implementation it was a surprise to find this missing.

We are definitely going to prototype something in exactly the way you described, using <uws:jobinfo>.

The things I'm discussing in #3 are, I think, more narrowly driven by service concepts for the Rubin Science Platform and, potentially, for future IRSA mission-specific services and applications, which is why I labeled that one as likely to be controversial. If the decision were to be "you can have standardID but not the service URL" that would be fine.

The standardID seems to me to be more of a clear gap in the original conception of the job record.


Having said all this, I should note for the sake of a more complete discussion that the issue I'm trying to address with the standardID could also be addressed, in part, with richer metadata on the individual results. I note that UWS v1.1 already "Added optional size and mime-type attributes to the ResultReference", which helps a client, a little, deal with results in a more generic way. This path could be followed further by allowing UWS services the ability to - optionally - describe the individual results with the same sort of metadata we use elsewhere for things we consider "data products" (e.g., data product type, semantics, coordinates on various axes, ...). While this is somewhat appealing in its own right, I don't think it's a great replacement for providing the standardID. They overlap somewhat, but neither one completely covers the other.