uncefact / spec-jsonld

Exposing the UN/CEFACT vocabulary as web semantics
https://service.unece.org/trade/uncefact/vocabulary/uncefact/
13 stars 5 forks source link

feat: UNLOCODE to JSON-LD transformer #105

Closed kshychko closed 2 years ago

kshychko commented 2 years ago

This PR introduces the transformer for UN/LOCODE to JSON-LD. The transformer produces a jsonld file per UN/LOCODE, the total number of files >100k. The zip file is ~35 Mb.

Below is the example for CHGVA:


{
    "@context": {
        "unece": "https://service.unece.org/trade/uncefact/vocabulary/unece#",
        "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
        "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
        "unlocode": "https://service.unece.org/trade/uncefact/vocabulary/unlocode/"
    },
    "@graph": [
        {
            "@id": "unlocode:CHGVA",
            "@type": "unece:UNLOCODE",
            "rdfs:comment": "Genève",
            "rdf:value": "CHGVA",
            "unece:region": "GE",
            "unece:country": "SWITZERLAND"
        }
    ]
}

My questions:

  1. Is it okay to use unece namespace for additional attributes like region and country
  2. Does country should be the name or just the code? If the name - should it be in camel case? At the moment it is in upper case.
  3. There is a csv file with sub-dision codes. Do we need to parse it and use it somehow, or these data can be just ignored?
nissimsan commented 2 years ago
            "unece:region": "GE",
            "unece:country": "SWITZERLAND"

... Use unlocode namespace here

nissimsan commented 2 years ago

"unece:country": "SWITZERLAND" ... Just keep this capitalized

nissimsan commented 2 years ago

@cmsdroff, we'd appreciate your input on subdivisions here. Assuming they are a requirements?

cmsdroff commented 2 years ago

Subdivision is required it is the differentiator between the same place in different states for example:

FairOaks in the US has 3, so we need the subdivision to know which one.

Screenshot 2022-07-14 at 09 27 20

My important attributes are

In the JSON LD I would suggest we define the countries, and subdivisions and then link to them from the UNLOCODE, this way the json-ld can show which LOCODES are in which country and which state/subdivision when visualised.

cmsdroff commented 2 years ago

@kshychko Here's the turtle example I showed today to cover Geneva and Fair Oaks from the above to show why state/sub division is needed.

So I would suggest we have 3 levels

We should allow choice of things that are well defined like Country and State and I referenced these from schema.org in example below rather than 'roll our own' like it is done todays publication.

The example below is a rough draft to show a POC to explain above so please be forgiving ;). You can convert to JSON-LD from turtle

@prefix dbr:    <http://dbpedia.org/resource/> .
@prefix dbo:    <http://dbpedia.org/ontology/> .
@prefix rdfs:   <http://www.w3.org/2000/01/rdf-schema#> .
@prefix foaf:   <http://xmlns.com/foaf/0.1/> .
@prefix geo:    <http://www.w3.org/2003/01/geo/wgs84_pos#> .
@prefix xsd:    <http://www.w3.org/2001/XMLSchema#> .
@prefix schema: <http://schema.org/> .

dbr:USA
a schema:Country ;
rdfs:label "United States of America"@en ;
geo:lat "31.8830464"^^xsd:float ;
geo:long "-132.0211978"^^xsd:float .

dbr:Switzerland
a schema:Country ;
rdfs:label "Switzerland"@en ;
rdfs:label "Schweiz"@ch ;
geo:lat "46.8131873"^^xsd:float ;
geo:long "8.22421"^^xsd:float .

dbr:California
a schema:State ;
rdfs:label "California"@en ;
rdfs:label "CA"@en ;
geo:lat "36.9457393"^^xsd:float ;
geo:long "-128.3011471"^^xsd:float ;
dbo:Country dbr:USA .

dbr:Indiana
a schema:State ;
rdfs:label "Indiana"@en ;
rdfs:label "IN"@en ;
geo:lat "39.7491898"^^xsd:float ;
geo:long "-88.6843581"^^xsd:float ;
dbo:Country dbr:USA .

dbr:Geneva
a schema:City ;
rdfs:label "Geneva"@en ;
rdfs:label "Genève"@ch ;
geo:lat "46.2050836"^^xsd:float ;
geo:long "6.1090692"^^xsd:float ;
dbo:Country dbr:Switzerland .

dbr:Lausanne
a schema:City ;
rdfs:label "Lausanne"@en ;
geo:lat "46.52848"^^xsd:float ;
geo:long "6.652495"^^xsd:float ;
dbo:Country dbr:Switzerland .

dbr:FairOaksCA
a schema:City ;
rdfs:label "Fair Oaks"@en ;
geo:lat "38.6434976"^^xsd:float ;
geo:long "-121.3124502"^^xsd:float ;
dbo:State dbr:California ;
dbo:Country dbr:USA .

dbr:FairOaksIN
a schema:City ;
rdfs:label "Fair Oaks"@en ;
geo:lat "41.0750666"^^xsd:float ;
geo:long "-87.2750379"^^xsd:float ;
dbo:State dbr:Indiana ;
dbo:Country dbr:USA .

if you copy this to https://issemantic.net/rdf-visualizer you will get the nice graph.

nissimsan commented 2 years ago

@kshychko , please update the code with what produced this and let's merge and move on.

cmsdroff commented 2 years ago

Some observations and apologies if covered prior to joining

on this https://github.com/uncefact/codes-locode/tree/main/vocab

‘subdivisions’ is spelt incorrectly so would be a issue to link from the json-of

the subdivisions for example only go to the country Begining with ‘E’ is this a sample or a problem I can’t find subdivisions for GB as example

other than this I agree looks good so far

kshychko commented 2 years ago

@cmsdroff , the outputs have been updated @nissimsan , the PR is ready for review

cmsdroff commented 2 years ago

@kshychko is this correct or typo? ‘unlcdc’

it was in countries file?


    "@context": {
        "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
        "unlcdc": "https://service.unece.org/trade/uncefact/vocabulary/unlocode-countries/",
        "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    },
    "@graph": [
        {
            "rdf:value": "GB",
            "@id": "unlcdc:GB",
            "@type": "unlcdc:Country",
            "rdfs:label": "UNITED KINGDOM"
        }
    ] ```
cmsdroff commented 2 years ago

@kshychko is this correct or typo? ‘unlcdc’

it was in countries file?

    "@context": {
        "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
        "unlcdc": "https://service.unece.org/trade/uncefact/vocabulary/unlocode-countries/",
        "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    },
    "@graph": [
        {
            "rdf:value": "GB",
            "@id": "unlcdc:GB",
            "@type": "unlcdc:Country",
            "rdfs:label": "UNITED KINGDOM"
        }
    ] ```

Apologies been through the files and see the namespaces for country, subdivision and other now makes sense.

I think I was expecting to see like these

https://github.com/uncefact/spec-jsonld/pull/105#issuecomment-1184144437