Open dr-shorthair opened 3 years ago
Thank you @dr-shorthair . I'll add it as a possible enhancement. It does not look particularly straightforward to me. Maybe we can discuss this next time we meet.
Here you are.
👍 Thank you @dr-shorthair
Added SQUM
P06-ucum.ttl
Update on this: Adding UCUM codes as an extra SKOS element would require us to extend our schema but we think that it would be a good move indeed if we were to replace our existing altLabel with one (our preferred) UCUM symbols for each of the P06 unit. We will circulate this proposal to key users of the P06 vocab. This would enable us to harmonise against the impressive work done by the UCUM team.
When you say 'extend our schema' do you mean (a) the Oracle schema used for maintenance and point-of-truth, or (b) the RDF output schema? If the latter, then I don't think an extension is needed: skos:notation
would be appropriate.
I see you are already using skos:notation
for the SDN identifier (duplicated in dc:identifier
and dce:identifier
). This should not be a problem: skos:notation
can be repeated. However, to make it useable I'd recommend adding an rdfs:Datatype
for UCUM - e.g. see https://github.com/qudt/qudt-public-repo/blob/master/schema/SCHEMA_QUDT-v2.1.ttl#L1577 which is used in https://github.com/qudt/qudt-public-repo/blob/master/vocab/unit/VOCAB_QUDT-UNITS-ALL-v2.1.ttl
Fix a few errors P06-ucum.ttl
When you say 'extend our schema' do you mean (a) the Oracle schema used for maintenance and point-of-truth, or (b) the RDF output schema? If the latter, then I don't think an extension is needed:
skos:notation
would be appropriate.I see you are already using
skos:notation
for the SDN identifier (duplicated indc:identifier
anddce:identifier
). This should not be a problem:skos:notation
can be repeated. However, to make it useable I'd recommend adding anrdfs:Datatype
for UCUM - e.g. see https://github.com/qudt/qudt-public-repo/blob/master/schema/SCHEMA_QUDT-v2.1.ttl#L1577 which is used in https://github.com/qudt/qudt-public-repo/blob/master/vocab/unit/VOCAB_QUDT-UNITS-ALL-v2.1.ttl
@dr-shorthair I meant (a) because, unless we use one of the existing fields (in this case altLabel is an option) then we need to add a new field to store this information somewhere in the Oracle schema before outputing it into the RDF schema. We will look at options.
There is now some work going on in OBO to re-boot UO as a cross-walk for existing units vocabs. They are using UCUM codes as the key, so would be helpful if the UCUM codes got incorporated into P06 so it could be added to the OBO mappings.
Thanks @dr-shorthair! That work is being supported (on my end) as part of BCODMO's revamped data management vocabulary. As we have alignment/mapping with NERC in mind (cc @ashepherd @jaclynsaunders @DanieK), I'd love to see this happen. The new OBO unit vocab effort has already managed to leverage @dr-shorthair's extensive work making UCUM mappings to bridge to QUDT and OM. I'd love to see the same happen with NERC P06.
Hi @kaiiam thank you for your enthusiasm and support! This is definitely the plan and the update is ready to go, just waiting for a quiet window of time to push it through to production. With regards to alignments between BCODMO and NVS vocabs, we submitted an abstract a year ago for the IMDIS conference that was postponed by 6 months due to COVID (see https://imdis.seadatanet.org/files/IMDIS2021_143_abstract.pdf). I am putting the slides together at the moment. It'd be great to have some examples from you too.
@gwemon thanks maybe once we have the new unit system ready to show, and we've imported the NERC P06 - UCUM mappings I could prepare a slide showing the new OBO unit vocab linking off to NERC P06, QUDT, and OM.
@dr-shorthair We've hit an issue with the proposal to replace P06 alternative labels with the UCUM notation when I was made aware to the fact that thousands of our ODV and netCDF files use the P06 alternative label to refer to the units (a legacy issue). There is therefore a risk in implementing what I was proposing because this field would change for 2/3rd of the units held in P06. So, instead, we are proposing to capture the UCUM notation in structured XML in the definition field by adding
@gwemon You can't embed XML like that into the definition element of the XML output because it will break the schema. That is why the XML in the structured description fields in bodccodes is translated into JSON by the the NVS software.
Have a look at http://vocab.nerc.ac.uk/collection/C19/current/UKMDN025/ You will see that the definition element contains the following JSON:
{"Spatial_Coverage": { "Southernmost_latitude": "53.245121", "Northernmost_latitude": "53.455889", "Westernmost_longitude": "-3.068628", "Easternmost_longitude": "-2.680371" }}
However, if you select description from bodccodes where codval='UKMDN025' you will see that is an XML snippet, not JSON.
When I suggested the solution of using a structured definition element I was envisioning it containing the existing text plus the UCUM code encoded in JSON along the lines of:
{"notes": "existing_text", "ucum": "ucum_code"}
This would be produced by setting bodccodes.description to:
[notes]existing_text[/notes][ucum]ucum_code[/ucum] where '[' is an opening chevron and ']' is a closing chevron: I can't work out how to escape chevrons so they don't upset GitHub.
(if description is currently null then there would be no notes element).
Of course, you could simplify the whole process by encoding bodccodes.description in JSON.
Regarding mappings to the new unit interchange system our goal is to make valid ttl IDs from UCUM strings. E.g.,
We certainly don't want to interfere with how the NERC P06 is built/functions. In order to be compatible with our system all that would be required is a basic mapping between P06 IRI's and UCUM strings, which could be in whatever format (csv json ttl etc) that we could parse.
Something like:
NERC IRI | UCUM |
---|---|
http://vocab.nerc.ac.uk/collection/P06/current/AMPB/ | A |
http://vocab.nerc.ac.uk/collection/P06/current/BQ11/ | Bq |
In principal this mapping could even just be pulled from @dr-shorthair's P06-ucum.ttl assuming it's correct up to date etc.
@gwemon Am I missing something or could the necessary be achieved through the URLMAP_EXT mechanism? To explain to @kaiiam this is an easy to implement mechanism that causes the UCUM code encoded as a URI (I assume that this is possible) to be included as a mapping in the RDF output in the P06 code.
To see what I mean, look at:
http://vocab.nerc.ac.uk/collection/P07/current/EHEBBEHE/
The URL
http://mmisw.org/ont/cf/parameter/sea_water_preformed_salinity
in the output is delivered from the back office through the URLMAP_EXT mechanism.
@roy-lowry UCUM notations are not available as URIs so we cannot do this.
@gwemon Thought it was too good to be true. So, back to structured definitions....
@roy-lowry UCUM notations are not available as URIs so we cannot do this.
Yes exactly hence the whole point of our new units interchange system to make resolvable IRIs based on the SI and UCUM. Some UCUM stings e.g. {#}.g-1
can't just be cast to an IRI ID and be valid ttl
syntax. So we're working on workarounds for this.
For now I'll parse @dr-shorthair's P06-ucum.ttl file (like I did with his work on QUDT and OM) so that the new system can have an initial maping to P06. Later once this is resolved, I can update the mapping to pull from something the NERC team is producing and or updating.
So, back to structured definitions....
Another goal of our system is to auto-produce labels, definitions and mappings based on the input UCUM/SI codes e.g. given the input code mA
the script creates the following ttl:
Although BCODMO is supporting this work (thanks again @ashepherd) the idea is to make it general enough for anyone to use and it will be under something like a CC0
license (anyone can use for any purpose). Happy to have this be a collaboration point with NERC if there is interest. We've still got lots to do with this new unit system.
Thanks for commnets @kaiiam @roy-lowry @dr-shorthair do you have a dedicated datatype for UCUM units?
I've implemented mapping @dr-shorthair P06-ucum.ttl into the beta units interchange system.
Great. Thank you @kaiiam. @alko-k and I are looking our options from the NVS side.
Great @gwemon no pressure from us, I just wanted to make sure to leverage @dr-shorthair extensive mapping work to make sure the new system actually serves as a mapping between as many unit systems as possible including NERC P06.
Suggest adding the UCUM codes, as a
skos:notation
, maybe with a datatype. UCUM is a very sound set of unit symbols generally matching the ones already used.UCUM spec is here: https://ucum.org/ucum.html
And there is a UCUM-based quantity conversion service API here: https://ucum.nlm.nih.gov/ucum-service.html (and a UI here: https://ucum.nlm.nih.gov/ucum-lhc/demo.html )