uncefact / spec-jsonld

Exposing the UN/CEFACT vocabulary as web semantics
https://service.unece.org/trade/uncefact/vocabulary/uncefact/
13 stars 5 forks source link

Suggestion of a follow-up use-case(s) #156

Open svanteschubert opened 1 year ago

svanteschubert commented 1 year ago

To me, the first possible definition of done for this project is reached when every UN/CEFACT data will be directly accessible.

In other words, today's data, which might be available as zipped "spreadsheet data" (like the CCL using the old XLS binary Excel format or as OOXML Rec20) or as an HTML text blob (like UN/EDIFACT 5153 Duty or tax or fee type name code) is being transformed to a directly accessible web representation, where each data can be referenced via an URL pointing (in)to a structured HTML file using fragment identifiers (#). In addition, the same data URL will be reused within RDF graphs to provide context information.

This seems the case with https://vocabulary.uncefact.org/

It would be certainly recommended that this is not only a proof-of-concept, but UN/CEFACT library maintenance is able to maintain this site by doing bi-yearly releases (taking advantage of this project) with as much automation as possible.

For this, it is sufficient if the data set stays close to the text/syntax of the previously written data in cells or text. But as shown by Bret Victor in "Invention on Principle" (although he focuses on visuals), we often transfer/map the data (like from paper-pencil to computer picture) into a new advanced environment (like we from text to RDF graph), but we are often not using the new environment's full potential!

Our high-level goal is digitalization, which means to me access and work with the data without further human interaction.

The following list of proposals is just a quick draft and might perhaps as well be outsourced to a user crowd, like at universities for student/PhD work?

  1. The most common reason to improve semantics is likely to improve search. Can we search on UN/CEFACT data to return everything that is related to Volume? Not yet, We likely need to enhance manually the RDF data set. Sometimes there are sub-semantics like atoms of a molecule, which are not yet present: Like fiftyfive gallon (US) drum. See also #70. It is getting more difficult with comment text blobs semantic accessible, like "Local tax for construction." or even more complex "Duty paid and held on deposit, by Customs, during an investigation period prior to a final decision being made on any aspect related to imported goods (except valuation) by Customs."..
  2. Of course the more advantages the semantic annotation provides to the end user the better the ROI! Can we feed this data to the software? How about a software library, which takes the UN/CEFACT data measures and offers automatic conversion? The data always offers a group of different unit measures (e.g. for volume as quantity) and a SI base unit for normalization or a combination of base units (e.g. speed = length/time).
  3. We should not be too surprised if there are other semantic definitions, which overlap with ours. But there might be some ground truth when adding a Wikipedia URL to express as a common equivalent URL identifier. For instance, our https://vocabulary.uncefact.org/rec20#PAL would be equal to https://en.wikipedia.org/wiki/Pascal_(unit). The more projects would do this, the more the graphs would be connected and reusable!

Just as an initial pointer into a direction...

nissimsan commented 1 year ago

@svanteschubert , would you mind summarizing the concrete action you are suggesting to take on this? Or is this rather a discussion starter?

What's needed to close this issue?