uncefact / spec-jsonld

Exposing the UN/CEFACT vocabulary as web semantics
https://service.unece.org/trade/uncefact/vocabulary/uncefact/
13 stars 5 forks source link

Property Datatypes #45

Open VladimirAlexiev opened 2 years ago

VladimirAlexiev commented 2 years ago

Currently UNCEFACT uses only two literal datatypes: xsd:string (791 props) and xsd:token (159 props).

UNCEFACT prop names are made according to ISO/IEC 11179 Metadata Registry (MDR), part 5:2015 Naming and identification principles. The last word of prop names (let's call it "kind") suggests many other datatypes.

Surely trade involves some numbers and some dates?!?

I checked that all props with kind Id are xsd:token (good). This query counts xsd:string props by "kind":

PREFIX schema: <http://schema.org/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
select ?kind (count(*) as ?c) {
  ?prop schema:rangeIncludes xsd:string
  bind(replace(str(?prop),".*([A-Z][a-z]*)","$1") as ?kind)
  filter(regex(?kind,"^[A-Z]"))
} group by ?kind order by ?kind
Count and tentative proposed changes: kind c change to
"Access" 1
"Agency" 1
"Amount" 89 numeric
"Basis" 2
"Box" 1
"Charge" 1
"Code" 154 xsd:token
"Conditions" 1
"Criteria" 1
"Date" 3 xsd:date
"Description" 21
"Dimension" 1
"Five" 1
"Four" 1
"Indicator" 73 xsd:boolean
"Information" 21
"Instructions" 2
"Limit" 2
"List" 2
"Means" 1
"Measure" 66
"Name" 47
"Number" 4 numeric
"Numeric" 15 IndexNumeric, SequenceNumeric -> xsd:integer
"Object" 7
"Of" 2
"One" 1
"Pattern" 1
"Percent" 16 numeric
"Phrase" 1
"Point" 1
"Procedure" 1
"Quantity" 91 numeric
"Rate" 4
"Reason" 7
"Reference" 6
"Remark" 2
"Remarks" 1
"Restriction" 3
"Result" 1
"Status" 1
"Three" 1
"Time" 79 xsd:dateTime
"Title" 1
"Two" 1
"Type" 9
"Use" 1
"Value" 1
"Zone" 1

Examples:

Fak3 commented 2 years ago

Edi3 issue: https://github.com/edi3/edi3-json-ld-ndr/issues/51

nissimsan commented 2 years ago

It's very hard to disagree on this! :)

@kshychko , we did some work on this - can you double check if this was fixed already, pls?

nissimsan commented 2 years ago

@Fak3 , how are you doing? We'd love to have you back and attend the calls!!! ❤️

VladimirAlexiev commented 11 months ago

Currently, all Id properties are rendered as token (good!) and all other data properties as string (not good):

grep rangeIncludes uncefact.ttl |sort|uniq -c|sort -rn|less
    791         schema:rangeIncludes            xsd:string ;  
    159         schema:rangeIncludes            xsd:token ;   

In particular: