italia / daf-ontologie-vocabolari-controllati

Elenco di ontologie e vocabolari controllati. Per maggiori informazioni, si veda il readme principale e quello di singoli vocabolari/ontologie, ove presente, e la seguente documentazione
https://github.com/italia/daf-ontologie-vocabolari-controllati/wiki
Creative Commons Attribution 4.0 International
82 stars 51 forks source link

Issues identified in Semantic Assets during initial NDC harvesting #122

Closed SrinivasanTarget closed 1 year ago

SrinivasanTarget commented 2 years ago
SrinivasanTarget commented 2 years ago

@giorgialodi @ioggstream @spuliz FYI & A, we need to course correct above data.

ioggstream commented 2 years ago

@SrinivasanTarget not sure that dcat:distribution applies to ontologies, since:

dcat:distribution
  dfs:range dcat:Distribution ;
  skos:definition "An available distribution of the dataset."@en
.

dcat:Distribution 
    rdfs:comment "A specific representation of a dataset..."@en ;
giorgialodi commented 2 years ago

@ioggstream it applies do not worry :) It is like that according to the metadata ontology ADMS we use in OntoPiA (which is in turn, according to ADMS-AP, based on DCAT-AP).

giorgialodi commented 2 years ago

Dear @SrinivasanTarget @ioggstream I reconstructed using ADMS the modelling. In essence,

1) an Ontology is a SemanticAsset that is also in turn a Dataset according to DCAT-AP_IT. 2) A SemanticAsset has a distribution. 3) To define point 2, in the ontology we introduced a property that says :SemantiAsset :hasSemanticAssetDistribution :SemanticAssetDistribution, 4) the SemanticAssetDistribution is a Distribution according to DCAT-AP_IT 5) according to points above, since a SemanticAsset is a Dataset and a SemanticAssetDistribution is a Distribution we can say that :hasSemanticAssetDistribution is a specialization of dcat:distribution.

In essence, from a semantic perspective there are no issues for the ontologies. From a technical perspective you have to take into account hasSemanticAssetDistribution instead of directly dcat:distribution.

The other points I think are typos and we need to correct them. Thanks for spotting them out!

SrinivasanTarget commented 2 years ago

@giorgialodi Thanks for the clarification. your suggestions are incorporated now. Summarising below are the list of issues we still need to fix:

./daf-ontologie-vocabolari-controllati/vocs-deprecated/ or daf-ontologie-vocabolari-controllati/deprecated/ - contains deprecated assets


- <Several views as CSV's for Single Semantic Asset> 
SrinivasanTarget commented 2 years ago

@ioggstream @giorgialodi do we have any date on fixing the above issues in repo?

giorgialodi commented 2 years ago

@SrinivasanTarget working on it

giorgialodi commented 2 years ago

@SrinivasanTarget this "Unable to extract node summary from resource 'http://dati.gov.it/onto/covapit/' using 'http://purl.org/dc/terms/rightsHolder' where does it originate?

giorgialodi commented 2 years ago

@SrinivasanTarget this InvalidModelException: Cannot load RDF model from '/VocabolariControllati/classifications-for-her/erc-panel-h2020-fp/erc-panel-h2020-fp.ttl' check here --> what's the problem? I validated the file and it seems correct.

giorgialodi commented 2 years ago

@SrinivasanTarget @ioggstream @spuliz @mfortini I fixed here some issues and PRs must be merged. In particular the following ones:

1) Unable to extract node summary from resource 'https://w3id.org/italia/controlled-vocabulary/theme-subtheme-mapping' using 'http://purl.org/dc/terms/rightsHolder' 2) InvalidModelException: Unable to extract node summary from resource 'http://dati.gov.it/onto/dcatapit' using 'http://purl.org/dc/terms/rightsHolder' 3) Invalid xsd date format in /classifications-for-universities/academic-disciplines/academic-disciplines.ttl

For the following ones: 1) InvalidModelException: Cannot load RDF model from '/VocabolariControllati/classifications-for-her/erc-panel-h2020-fp/erc-panel-h2020-fp.ttl' check here 2) Unable to extract node summary from resource 'http://dati.gov.it/onto/covapit/' using 'http://purl.org/dc/terms/rightsHolder'

I need more details.

Finally, Caused by: org.apache.jena.rdf.model.LiteralRequiredException: https://raw.githubusercontent.com/italia/daf-ontologie-vocabolari-controllati/master/Ontologie/AtlasOfPaths/v0.1/AtlasOfPaths-AP_IT.png check here for more details --> for me this is fine not an error since dct:description can be used with non literal values. Can you skip the value of this property?

SrinivasanTarget commented 2 years ago

Finally, Caused by: org.apache.jena.rdf.model.LiteralRequiredException: master/Ontologie/AtlasOfPaths/v0.1/AtlasOfPaths-AP_IT.png (raw) check here for more details --> for me this is fine not an error since dct:description can be used with non literal values. Can you skip the value of this property?

@giorgialodi this is fixed at our end today.

SrinivasanTarget commented 2 years ago

InvalidModelException: Cannot load RDF model from '/VocabolariControllati/classifications-for-her/erc-panel-h2020-fp/erc-panel-h2020-fp.ttl' check here Unable to extract node summary from resource 'http://dati.gov.it/onto/covapit' using 'http://purl.org/dc/terms/rightsHolder'

@giorgialodi these issues seem to be fixed too.

SrinivasanTarget commented 2 years ago

@giorgialodi Is it a valid semantic asset?

Also for cities, we are getting Caused by: org.apache.jena.shared.PropertyNotFoundException: http://purl.org/dc/terms/identifier?

giorgialodi commented 2 years ago

@giorgialodi Is it a valid semantic asset?

No that is an example of data. It is data not controlled-vocabulary or ontology

Also for cities, we are getting Caused by: org.apache.jena.shared.PropertyNotFoundException: http://purl.org/dc/terms/identifier?

I will check, but this vocabulary was produced automatically

giorgialodi commented 2 years ago

@SrinivasanTarget are you sure about cities? Because in that vocabulary I see that property http://purl.org/dc/terms/identifier "https://w3id.org/italia/controlled-vocabulary/territorial-classifications/cities" ;

SrinivasanTarget commented 2 years ago

@giorgialodi

<http://purl.org/dc/terms/rightsHolder> <https://w3id.org/italia/data/public-organization/ISTAT> ;

........
<https://w3id.org/italia/data/resource/organization/public-organization/ISTAT> a <http://dati.gov.it/onto/dcatapit#Agent> , <http://xmlns.com/foaf/0.1/Agent> ;
    <http://purl.org/dc/terms/identifier> "ISTAT" ;
    <http://xmlns.com/foaf/0.1/name> "Istituto Nazionale di Statistica"@it , "Italian National Institute of Statistics"@en .

rightsHolder URL seem to be wrong.

SrinivasanTarget commented 2 years ago

@giorgialodi Since we are processing assets only at leaf level folders. below folders are not allowing us to process the actual assets.

VocabolariControllati/territorial-classifications/provinces/scriptR2RML
VocabolariControllati/territorial-classifications/geographical-distribution/scriptR2RML
VocabolariControllati/territorial-classifications/cities/scriptR2RML

Can we please remove it? thoughts please.

giorgialodi commented 2 years ago

@giorgialodi

<http://purl.org/dc/terms/rightsHolder> <https://w3id.org/italia/data/public-organization/ISTAT> ;

........
<https://w3id.org/italia/data/resource/organization/public-organization/ISTAT> a <http://dati.gov.it/onto/dcatapit#Agent> , <http://xmlns.com/foaf/0.1/Agent> ;
  <http://purl.org/dc/terms/identifier> "ISTAT" ;
  <http://xmlns.com/foaf/0.1/name> "Istituto Nazionale di Statistica"@it , "Italian National Institute of Statistics"@en .

rightsHolder URL seem to be wrong.

You are right @SrinivasanTarget where is it? In which voc or ontolog?

SrinivasanTarget commented 2 years ago

@giorgialodi

<http://purl.org/dc/terms/rightsHolder> <https://w3id.org/italia/data/public-organization/ISTAT> ;

........
<https://w3id.org/italia/data/resource/organization/public-organization/ISTAT> a <http://dati.gov.it/onto/dcatapit#Agent> , <http://xmlns.com/foaf/0.1/Agent> ;
    <http://purl.org/dc/terms/identifier> "ISTAT" ;
    <http://xmlns.com/foaf/0.1/name> "Istituto Nazionale di Statistica"@it , "Italian National Institute of Statistics"@en .

rightsHolder URL seem to be wrong.

You are right @SrinivasanTarget where is it? In which voc or ontolog?

In CV.

giorgialodi commented 2 years ago

@giorgialodi Since we are processing assets only at leaf level folders. below folders are not allowing us to process the actual assets.

VocabolariControllati/territorial-classifications/provinces/scriptR2RML
VocabolariControllati/territorial-classifications/geographical-distribution/scriptR2RML
VocabolariControllati/territorial-classifications/cities/scriptR2RML

Can we please remove it? thoughts please.

Hmmm, remove it is really a shame. Those are scripts for transforming that data into the RDF vocabulary that is published. it is important knowledge to provide to end-users. Can't you skip them?

giorgialodi commented 2 years ago

@giorgialodi

<http://purl.org/dc/terms/rightsHolder> <https://w3id.org/italia/data/public-organization/ISTAT> ;

........
<https://w3id.org/italia/data/resource/organization/public-organization/ISTAT> a <http://dati.gov.it/onto/dcatapit#Agent> , <http://xmlns.com/foaf/0.1/Agent> ;
  <http://purl.org/dc/terms/identifier> "ISTAT" ;
  <http://xmlns.com/foaf/0.1/name> "Istituto Nazionale di Statistica"@it , "Italian National Institute of Statistics"@en .

rightsHolder URL seem to be wrong.

You are right @SrinivasanTarget where is it? In which voc or ontolog?

In CV.

Do you remember which one? So I can correct it directly?

SrinivasanTarget commented 2 years ago

Hmmm, remove it is really a shame. Those are scripts for transforming that data into the RDF vocabulary that is published. it is important knowledge to provide to end-users. Can't you skip them?

@giorgialodi what's your suggestion on how to skip? Is it folder with name scriptR2RML need to skipped? any other suggestion on skip rules?

SrinivasanTarget commented 2 years ago

@giorgialodi As changing in algorithm of folder search is time consuming, can we moved all the scriptR2RML under /VocabolariControllati ? is it possible to rearrange like below?

/VocabolariControllati/scriptR2RML/cities/*.ttl
/VocabolariControllati/scriptR2RML/geographical-distribution/*.ttl

/VocabolariControllati/territorial-classifications/cities/cities.ttl
/VocabolariControllati/territorial-classifications/geographical-distribution/geographical-distribution.ttl
giorgialodi commented 2 years ago

@SrinivasanTarget it is not only that, there is another case in which a CV has a directory that is named sparql. It is not very clear for users of this repo to do these movements from my point of view; however, since the priority is NDC and the files within this scriptR2RML have the name of the vocs, I think we can do it. Not sure about the case of sparql directory. I am going to check. I am also thinking how to deal with the CV that has more than one turtle file.

giorgialodi commented 2 years ago

@SrinivasanTarget @ioggstream I have the following proposals:

1) sparql directory in transparency-titulus is moved to the root 2) scriptR2RML in the three territorial classifications is moved to the root 3) transparency-obligation has more than one voc in the directory. Let's create another directory transparency-obligation-organisation and move the second turtle and all the rest in this directory

ioggstream commented 2 years ago

If it's ok for Giorgia, it is for me. Probably shaping the repo on the temporary algorithm is suboptimal, but we're all committed to achieving the goal :)

ioggstream commented 2 years ago

@simonetw how many of the above issues still stand? Can you mark the fixed issues pls?

giorgialodi commented 1 year ago

I think this has been fixed. I will close the issue