uncefact / spec-jsonld

Exposing the UN/CEFACT vocabulary as web semantics
https://service.unece.org/trade/uncefact/vocabulary/uncefact/
13 stars 5 forks source link

Suggestion to (temporaily) clean the output data due to incorrect input data... #155

Closed svanteschubert closed 1 year ago

svanteschubert commented 1 year ago

Already wrong in the generated XLS generated from GEFEG-FX is data, which...

  1. Contains whitespace at the end of the data, for instance in JSON-LD the rec20:TAN has "unece:conversionFactor": "mg KOH/g " by the trailing whitespace, the string is different. Or at the end of the file at: rec20:MTZ
  2. There are multiple hyphen characters available. The one by default on my keyboard is not always the one within the files, please take a look at the very end of rec20:Z9 - but I might be mistaken here. Just requesting a review! :)

These problems should as well be reported & fixed upstream (at the GEFEG FX export or the dataset) but meanwhile, we could clean the data via a simple trim function and character replacement - might be even a post-step.

nissimsan commented 1 year ago

The data source pipeline has been significantly refactored since this. This seems stale. I suggest we close and reopen if the issue (or similar issues) persist.

nissimsan commented 1 year ago

The JSON Schema (new data source) no longer has these details. It only has value and title. That (sort of) means the problem goes away. :) If you need these detialed descriptions of units added back, please raise a separate issue on it. It will have to start from the JSON Schema output to include it.