Open pdowler opened 2 weeks ago
The restriction may be too limiting for users who have existing identifiers with more structure (eg. multiple path components) that they need to capture. This could in principle be an issue for collection names (eg Survey/DRi vs Survey/DRj).
see #170
The purpose of the restriction was to be able to enable code to convert the fields to a URI and parse it back to the individual field values. That is currently possible because ObservationURI has exactly 2 components and PlaneURI has exactly 3 components. It would not be possible to parse and extract if components but the productID can have multiple path components.
Current code only uses the restrictions to implement validation of members
and inputs
and to assign consistent values in the observation and plane tables to support joins via these relationships. It would feasible to lift the restriction and make "valid URI content" be a metadata curation issue.
Currently Observation.collection
, Observation.observationID
, and Plane.productID
are the fields in the model. It would be more explicit to drop those and have the model include the URIs directly:
Observation.uri
Plane.uri
I would like to retain the basic form (caom:{collection}/...
) so the scheme means "CAOM entity" and a prefix on the Observation.uri
is a namespace that can be used to reference collections (usage: permissions, metadata-sync). Being able to reliably extract the collection name from these URIs also means that data providers can consistently generate their own publisherID (see below).
Plane.creatorID
already exists and should be an ivorn (ivo://{auth}/{collection}?...)
Data providers still need to inject their own Plane.publisherID (same as creatorID for original publisher) and that needs to be clarified/documented (but probably in a CAOM+TAP standard). In the current style of usage, a data provider would register a "data collection" in the IVOA registry with a resource identifier like ivo://{authority}/{collection}
eg ivo://cadc.nrc.ca/CFHT
. From there, a publisherID (of some "data" from that collection can be created by appending a query string with the logical Plane identifier. Current practice at CADC is to construct the publisherID as ivo://{authority}/{collection}?{Observation.observationID}/{Plane.productID}
. As long as the collection name can be unambiguously extracted from the ObservationURI (and PlaneURI) then this is easy to do. More structure (path components in the observationID and/or productID) would be OK - those would become opaque strings that could be read (by a human) but not parsed by any generic code.
For validation purposes, it would be good to require that Observation.uri
plus a separator (/) character is a prefix of Plane.uri
. Anything else is probably a mistake (caused by a bug).
In the code (java and python) these strings are restricted to being "valid path components".
These fields are used to generate several URIs:
ObservationURI of the form
caom:{collection}/{observationID}
for use as a reference inDerivedObservation.members
PlaneURI of the form
caom:{collection}/{observationID}/{productID}
for use as a reference inPlane.provenance.inputs
It is likely that
Plane.creatorID
is also assigned by using these values.Although not part of the model, the CADC implementation (at least) uses these fields to generate
Plane.publisherID
that is the primary external reference to a Plane (product) and used as the input ID by the caom DataLink service.