Closed pdowler closed 1 year ago
+1
While I agree that UR[I]dentifiers would benefit a standardised xtype (#16), I wonder if standardising one for uu[id]entifiers is needed. I guess is the usual balance/threshold issue of reserving a word versus fixing it in a standard...
UUID values have rules on what they can contain and a standard RFC 4122 for serializing them, so an xtype would mean the values can be validated.
@pdowler would this xtype apply to datatype=long[4]
, char[36]
or either ?
I use datatype="char" arraysize="36" xtype='uuid" because it makes reading a query result in text and using uuid values in queries much simpler, cut&paste, etc.
And yes, using canonical ascii form as described in the above RFC.
In response to Marco: for my usage of xtype="uuid" it allows my VOTable parser to convert the chars into a UUID object rather than String. Without it, I would have to do the string-to-uuid in more code, potentially every piece of client code that encounters UUIDs.... and try to consistently deal with failure of the implicit validation (detect errors). So it makes my code better and if someone else cares about UUID it makes their code better. And as usual, if some other software doesn't know or care about xtype="uuid" they still get a string.
Without that xtype, I'd have to use something like xtype="opencadc:uuid" and no one else would gain the benefit.
Aside from our internal usage (caom2, storage-inventory, vospace, numericid for users), I'm seeing UUID more and more in other systems, for example OIDC sub
.
Silly question, but are xtypes explicitly tied to a particular representation (so you couldn't have the same xtype for a UUID for char[36]
and datatype=long[4]
, or if there were to be a 128bit int type), or is that something that might be assumed by parsers?
In the case of xtype="uuid", one could unambiguously allow for any of:
datatype="char" arraysize="36" aka canonical hex representation datatype="long" arraysize="2" aka lower and upper bits in that order datatype="byte" arraysize="16" aka raw bytes
Clients could in principle parse any such values into a UUID object, so as output (in a VOTable column) all of those are usable and could co-exist.
As input (via an HTTP parameter or in an ADQL query) a service can only really say that it accepts one of those (eg datalink service descriptor) or that the column in the tap_schema is one of those types. In that context I always found char to be the most convenient: no url encoding needed for params, simple quoted string used in ADQL queries, and can generally cut&paste values without strange failures.
So, to actually answer the question: xtypes do specify a serialization that applies to values in VOTable, HTTP parameters, etc and we usually pick one serialization that works well in those places. We could allow for other serializations but it would have to add something useful.
A secondary concern is that the values should in principle be usable by s/w that doesn't grok the xtype. For example, if I have a tap service and a column is a primary key of xtype="uuid" I think datatype="char" makes that PK column more or less equally usable to a client that doesn't know what a uuid is - it's a little less type safe but that's all. Arrays of byte or long as a PK would be more complex for minimal benefit.
PR #24
a common unique identifier value type with canonical ascii serialization, eg
e0b895ca-2ee4-4f0f-b595-cbd83be40b04
main use case at CADC: primary key in databases with TAP access