monarch-initiative / dipper

Data Ingestion Pipeline for Monarch
https://dipper.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
57 stars 26 forks source link

Disambiguate dcat:Distribution from dcat:distribution #979

Closed nicholsn closed 4 years ago

nicholsn commented 4 years ago

Currently the dataset descriptions created by dipper only have the class IRI for dcat:Distribution and not the property IRI of dcat:distribution.

This PR updates the GLOBAL_TERMS.yaml to include the property as part of the map and updates the code to use the property correctly.

Fun fact, I detected this using the kgx package while validating the results after running dipper-etl.py for ensembl. Here is an example of the error and dataset descriptor that isn't valid.

[ERROR][INVALID_EDGE_LABEL] https://archive.monarchinitiative.org/20200908/#ensembl-https://archive.monarchinitiative.org/20200908/rdf/ensembl.ttl - Edge label 'DistributionLevel' is not in snake_case form
@prefix biolink: <https://w3id.org/biolink/vocab/> .
@prefix dcat: <http://www.w3.org/ns/dcat#> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix dctypes: <http://purl.org/dc/dcmitype/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix pav: <http://purl.org/pav/> .
@prefix schema: <http://schema.org/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<http://www.ensembl.org/biomart/martservice?> pav:retrievedOn "2020-09-08"^^xsd:date .

<https://archive.monarchinitiative.org/#ensembl> a dctypes:Dataset,
        owl:Ontology ;
    dcterms:Publisher <https://monarchinitiative.org/> ;
    dcterms:identifier "MonarchArchive:#ensembl" ;
    dcterms:source <http://uswest.ensembl.org> ;
    dcterms:title "ENSEMBL" ;
    schema:logo <https://github.com/monarch-initiative/monarch-ui/blob/master/public/img/sources/source-ensembl.png> ;
    owl:versionIRI <https://archive.monarchinitiative.org/20200908/#ensembl> .

<https://archive.monarchinitiative.org/20200908/#ensembl> a dctypes:Dataset ;
    dcterms:Publisher <https://monarchinitiative.org/> ;
    dcterms:created "2020-09-08"^^xsd:date ;
    dcterms:creator <https://monarchinitiative.org/> ;
    dcterms:isVersionOf <https://archive.monarchinitiative.org/#ensembl> ;
    dcterms:source <http://www.ensembl.org/biomart/martservice?> ;
    dcterms:title "ENSEMBL Monarch version 20200908" ;
    pav:version "2020-09-08"^^xsd:date ;
    dcat:Distribution <https://archive.monarchinitiative.org/20200908/rdf/ensembl.ttl> ;
    biolink:category biolink:DataSetVersion .

<https://archive.monarchinitiative.org/20200908/rdf/ensembl.ttl> a dctypes:Dataset,
        dcat:Distribution ;
    dcterms:Publisher <https://monarchinitiative.org/> ;
    dcterms:created "2020-09-08"^^xsd:date ;
    dcterms:creator <https://monarchinitiative.org/> ;
    dcterms:downloadURL <https://archive.monarchinitiative.org/20200908/rdf/ensembl.ttl> ;
    dcterms:format <https://www.w3.org/TR/turtle/> ;
    dcterms:license <https://project-open-data.cio.gov/unknown-license/> ;
    dcterms:title "ENSEMBL distribution ttl" ;
    pav:createdWith <https://github.com/monarch-initiative/dipper> ;
    pav:version "2020-09-08"^^xsd:date .