w3c-ccg / traceability-vocab

A traceability vocabulary for describing relevant Verifiable Credentials and their contents.
https://w3id.org/traceability
Other
34 stars 35 forks source link

Align to the production UN Vocab #536

Closed nissimsan closed 1 year ago

nissimsan commented 2 years ago

The UN CEFACT LD vocab should be bumped to version 1.0, expected during fall of 2022.

The main updates which will be required to be update on our side include:

I advise that we don't start this work until the UN vocab is published.

nissimsan commented 2 years ago

Current draft version: https://service.unece.org/trade/uncefact/vocabulary/uncefact/ Repo: https://github.com/uncefact/spec-jsonld

nissimsan commented 2 years ago

@VladimirAlexiev, @brownoxford, FYI ^

BenjaminMoe commented 2 years ago

@nissimsan any updates?

nissimsan commented 2 years ago

Yes, it's basically done. we're currently waiting for the vocabulary.uncefact.org DNS to propagate. It takes oddly long, something must have gone wrong.

This is what v1 will look like, though: dmvc7xzscpizo.cloudfront.net

nissimsan commented 1 year ago

The much improved production vocabulary.uncefact.org is live now. We should switch our pointers from the draft URIs to this. For example: https://service.unece.org/trade/uncefact/vocabulary/uncefact/#consigneeParty should be changed to https://vocabulary.uncefact.org/consigneeParty

nissimsan commented 1 year ago

We did this!

mgh128 commented 1 year ago

@nissimsan - re "We did this!", I'm wondering what exactly we did.

I've just been looking at https://vocabulary.uncefact.org/UnitMeasureCode

Hyperlinks such as https://vocabulary.uncefact.org/UnitMeasureCode#KGM go nowhere / provide no further details and I don't see any details about conversion factors when I check the source code for the page.

I also tried reloading the page after setting the HTTP header Accept: application/ld+json but that just produced a 404 error page with this rather unfriendly message:

404 Not Found Code: NoSuchKey Message: The specified key does not exist. Key: UnitMeasureCode.jsonld RequestId: KQDKKWY3RJQ844FQ HostId: CFF2nAkAhmrgaZgiqdJx8hrd7djYmRIsn9XamQ/2YY3MrwyjNkyG1tt43OxWM+p5kbp9hyRNIiY=

Conversion factors are still present in the older JSON-LD file at https://service.unece.org/trade/uncefact/vocabulary/rec20.jsonld

However that does not use the 2-3 character alphanumeric codes for its @id values, so you can find details for https://service.unece.org/trade/uncefact/vocabulary/rec20#kilogram but not for a URI ending in /KGM (or #KGM, though ideally /KGM )

In comparison, https://qudt.org/vocab/unit/KiloGM provides plenty of data about kilograms and a triple that links via qudt:uneceCommonCode to "KGM" and it would be even better if each UN ECE Rec20 unit code had a corresponding URI such as https://vocabulary.uncefact.org/UnitMeasureCode/KGM that provided similar information about conversion factors, so that QUDT could link to such a Web URI within https://vocabulary.uncefact.org rather than a dumb string such as "KGM".

nissimsan commented 1 year ago

Hi @mgh128,

What we did was switch from the draft to production UN/CEFACT term definitions. (https://github.com/w3c-ccg/traceability-vocab/pull/726) So we now reference for example https://vocabulary.uncefact.org/consigneeParty.

Good catch that the conversion factors are now missing from https://vocabulary.uncefact.org/UnitMeasureCode#KGM. Clearly that data has been available, so we must have dropped it along the way. Note that this is work done on the UN side, not on this repo. I will bring it up with the team - your clear requirements is a great help.

I agree completely that the QUDT should link with a real URI. Might that be something you can bring up there, changing KGM to https://vocabulary.uncefact.org/UnitMeasureCode#KGM?

mgh128 commented 1 year ago

Hi @nissimsan

Many thanks in advance for alerting the UN CEFACT team about the missing conversion factors. Unlike many other code lists, the code list(s) for unit of measure do require more than a code value and a description - so either there should be more 'columns' in the displayed table - or clicking on a link such as https://vocabulary.uncefact.org/UnitMeasureCode#KGM would result in a different page view with further details (including conversion factors) or perhaps expand an 'accordion' (e.g. using HTML <details> and <summary> to show further details without switching to a different page view).

I also noticed that within the list of code lists at https://vocabulary.uncefact.org/code-lists there is not only the main unit of measure code list https://vocabulary.uncefact.org/UnitMeasureCode but also some additional code lists such as:

unece:AirFlowUnitMeasureCode unece:DurationUnitMeasureCode unece:FileSizeUnitMeasureCode unece:LinearUnitMeasureCode unece:TemperatureUnitMeasureCode unece:VolumeUnitMeasureCode unece:WeightUnitMeasureCode

Unfortunately, this means that a unit of measure such as KGM for kilogram now appears in more than one code list, e.g.

unece:WeightUnitMeasureCode#KGM AND unece:UnitMeasureCode#KGM

or

unece:LinearUnitMeasureCode#MTR AND unece:UnitMeasureCode#MTR

Furthermore, the specialised code lists for unit of measure (e.g. https://vocabulary.uncefact.org/LinearUnitMeasureCode ) do not contain the complete set of units for that dimension or type of measurement.
For example, https://vocabulary.uncefact.org/LinearUnitMeasureCode includes values for centimetre (CMT), foot (FOT), inch (INH) and metre (MTR) but excludes values for yard (YRD), millimetre (MMT), micrometre (micron) (4H), kilometre (KMT), nautical mile (NMI), etc.

Similarly, https://vocabulary.uncefact.org/TemperatureUnitMeasureCode includes code values for degree Celsius (CEL) and degree Fahrenheit (FAH) but does not even use the SI base unit - kelvin (KEL), which only appears in the main code list as unece:UnitMeasureCode#KEL

I hope that you can also raise this issue with the UN CEFACT team.

Of course I'd be happy to discuss with QUDT folks and prepare a pull request when we've agreed what the QUDT property should be for pointing to corresponding Web URIs based on https://vocabulary.uncefact.org/UnitMeasureCode as a URI stem - but before I spend any time on that, I'd want to see https://vocabulary.uncefact.org/UnitMeasureCode updated to show the conversion factors that are already present in the older dataset at https://service.unece.org/trade/uncefact/vocabulary/rec20.jsonld and I'd also like to see a visible link to the RDF dataset (in Turtle and JSON-LD) for https://vocabulary.uncefact.org/UnitMeasureCode and/or have content negotiation working rather than generating a 404 Page Not Found error.

If I could actually see the RDF dataset behind https://vocabulary.uncefact.org/UnitMeasureCode then I could (1) easily detect whether the conversion factors are missing from the dataset or just not shown in the user interface and (2) offer to add the potentially missing triples for conversion factors to the dataset (using a SPARQL query using that dataset and https://service.unece.org/trade/uncefact/vocabulary/rec20.jsonld as the data sources).

We certainly appreciate the efforts of your team and the UN CEFACT team in making the code list for units of measure finally available as Linked Data rather than just an Excel spreadsheet and with the suggested improvements noted above, I think it will be a useful resource for everyone, including everyone in the GS1 community.

nissimsan commented 1 year ago

Excellent @mgh128 - cheers!

We can confirm the conversion factors were missing as we switched from Excel to a newer JSON Schema data source. The issue above is the first step, getting it included from upstream.

The term duplication you point out is the result of how the source data is modeled; using endless extensions rather than inheritance. This has been the main challenge of the project, there were no way around case-by-case decisions and rules.

Tagging @kshychko ref. conversation on slack yesterday. There are two things here: a) adding conversions, b) fixing duplicates. The former has a dependency and is IMO most critical as we need those conversions no matter how and when we might change modeling in the future.

@mgh128, zooming out, I can help pondering if the world actually needs two code lists. UN/CEFACT has traditionally liberally defined everything. In the modern world, this has led to significant term duplication which is an anti-pattern (my opinion). Units seems like another case of this, and as much as I love and am proud of the QUDT-UN cross-linking I feel like in an ideal world the UN would just adopt all QUDT's terms where there is overlap. I'm curious if you see any arguments against this - is there a reason why the world needs both? And how should I be thinking about choosing a QUDT over UN unit URI?

mgh128 commented 1 year ago

Hi @nissimsan Do you really mean CQRS or did you actually mean QUDT in your previous comment? If CQRS, please provide a link to where it specifies unit of measure codes because I think you might have picked the wrong 4-letter abbreviation containing a Q.

Regarding two systems for units of measure, I'd note that the UN CEFACT Rec20 unit codes are widely referenced throughout GS1 standards, so as a result, they are widely used in EDI messages, traceability data and master data, at least in the fast moving consumer goods sector and other industry sectors that GS1 supports, including healthcare, apparel and technical industries.

Having said that, in addition to UN/CEFACT Rec20 and QUDT, there is also UCUM - Unified Code for Units Of Measure ( https://ucum.org/ucum ). Unlike both QUDT and UN/CEFACT Rec20, it attempts to take a highly systematic approach to how its unit codes are created, rather than making choices that appear to be somewhat arbitrary. However, not all UCUM unit codes are URI-friendly, especially when using square brackets for units outside the SI system and forward-slash even in SI units such as m/s (metres per second), so that's a downside for a semantic ontology of units of measure and as far as I'm aware, UCUM is not yet published as a semantic ontology or Web vocabulary, whereas QUDT and UN/CEFACT Rec20 are. There is a corresponding dataset for UCUM - see https://ucum.nlm.nih.gov/ucum-lhc/

I'm fairly sure that QUDT provides links to UCUM unit codes but unfortunately only as string values because UCUM doesn't publish a Linked Data ontology as far as I know.

I am aware that in some cases, UCUM code values have been used within GS1's GDSN data model to fill in gaps in the coverage offered by UN/CEFACT Rec20 unit codes. I'm not convinced that using two distinct UoM code lists to populate a single unitCode property is good practice but they didn't ask my advice before taking that decision!

nissimsan commented 1 year ago

Argh - QUDT!! 🤦‍♂️ ... The other four letter acronym with a Q in it! Updated.

mgh128 commented 1 year ago

@nissimsan - yes, QUDT, not to be confused with the much older system SPQR which definitely didn't publish a Linked Data ontology ;-)