vliz-be-opsci / py-trav-harv

python module that will allow an enduser to perform link traversal on a triple store.
0 stars 0 forks source link

Fix/issue 41 and some refactoring in web_discovery and path_assertion #54

Closed cedricdcc closed 5 months ago

cedricdcc commented 6 months ago

Next to solving the issue of having some reporting this PR decouples web_discovery more from travharv so it can be used standalone.

Path assertion has been revised to now run properly for a given assertion path.

github-actions[bot] commented 6 months ago

Tests coverage table for c013752 commit.

pycoverage Name Stmts Miss Cover Missing
travharv/config_build.py 154 17 88.96% 51 219 249-250 337-338 370 375-378 428-429 436 451
travharv/execution_report.py 76 0 100.0%
travharv/executor.py 41 3 92.68% 76-77 83
travharv/helper.py 28 1 96.43% 32
travharv/path_assertion.py 112 16 85.71% 51-52 83-84 115-117 120-125 130
travharv/service.py 46 3 93.48% 74 89 92
travharv/store.py 57 5 91.23% 120 130-131 144
travharv/web_discovery.py 92 17 81.52% 21 49-50 58 120 125-126 154 170-171 178 185
TOTAL 606 62 89.77%
cedricdcc commented 6 months ago

A small example of how an execution_report looks now

@prefix prov: <http://www.w3.org/ns/prov#>.
@prefix sh: <http://www.w3.org/ns/shacl#>.
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
@prefix xsd: <http://www.w3.org/2001/XMLSchema#>.
@prefix void: <http://rdfs.org/ns/void#>.
@prefix schema: <http://schema.org/>.
@prefix travharv: <http://example.org/travharv/ns/>.

<urn:travharv:dbd467e6-3a6d-4f48-a4b7-ba5ab3dec81b> 
    a prov:Entity, sh:ValidationReport ;

    prov:generatedAtTime "2024-05-17 12:34:47.827136+00:00" ; 
    travharv:fromContext "base_test.yml" ;
    sh:result <urn:travharv:assertionresult-6cbe2989-5e08-4dda-bb7f-7bcfd0992479>, <urn:travharv:assertionresult-084e333a-9ce3-461f-8726-ca3945037e65>;
.

<urn:travharv:assertionresult-6cbe2989-5e08-4dda-bb7f-7bcfd0992479>
    a prov:Entity, sh:ValidationResult ; 

    prov:generatedAtTime "2024-05-17 12:34:49.055321+00:00" ; 

    sh:focusNode "http://marineregions.org/mrgid/63523"^^xsd:dateTime ; 
    sh:resultPath "<http://marineregions.org/ns/ontology#hasGeometry>"^^xsd:string ; 
    sh:resultMessage "Assertion failed, last path: <http://marineregions.org/ns/ontology#hasGeometry>"^^xsd:string ;
    travharv:lastRetrievableResource <urn:travharv:assertionresource-d356cccf-820d-4d26-8174-48b301f39e61> ;  
    .

<urn:travharv:assertionresource-d356cccf-820d-4d26-8174-48b301f39e61>
    a schema:DataDownload, void:Dataset ; 
    schema:contentUrl "http://marineregions.org/mrgid/63523" ; 
    schema:encodingFormat "application/ld+json" ; 
    void:triples "12"^^xsd:integer ;
.
<urn:travharv:assertionresult-084e333a-9ce3-461f-8726-ca3945037e65>
    a prov:Entity, sh:ValidationResult ; 

    prov:generatedAtTime "2024-05-17 12:34:50.870950+00:00" ; 

    sh:focusNode "http://marineregions.org/mrgid/63523"^^xsd:dateTime ; 
    sh:resultPath "<http://marineregions.org/ns/ontology#isPartOf>/<https://schema.org/geo>/<https://schema.org/latitude>"^^xsd:string ; 
    sh:resultMessage "Assertion failed, last path: <http://marineregions.org/ns/ontology#isPartOf>/<https://schema.org/geo>/<https://schema.org/latitude>"^^xsd:string ;
    travharv:lastRetrievableResource <urn:travharv:assertionresource-efef8479-0e46-4f71-a698-834584710aca> ;  
    .

<urn:travharv:assertionresource-efef8479-0e46-4f71-a698-834584710aca>
    a schema:DataDownload, void:Dataset ; 
    schema:contentUrl "http://marineregions.org/mrgid/17595" ; 
    schema:encodingFormat "application/ld+json" ; 
    void:triples "34"^^xsd:integer ;
.
cedricdcc commented 6 months ago

A small example of how an execution_report looks now

@prefix prov: <http://www.w3.org/ns/prov#>.
@prefix sh: <http://www.w3.org/ns/shacl#>.
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
@prefix xsd: <http://www.w3.org/2001/XMLSchema#>.
@prefix void: <http://rdfs.org/ns/void#>.
@prefix schema: <http://schema.org/>.
@prefix travharv: <http://example.org/travharv/ns/>.

<urn:travharv:dbd467e6-3a6d-4f48-a4b7-ba5ab3dec81b> 
    a prov:Entity, sh:ValidationReport ;

    prov:generatedAtTime "2024-05-17 12:34:47.827136+00:00" ; 
    travharv:fromContext "base_test.yml" ;
    sh:result <urn:travharv:assertionresult-6cbe2989-5e08-4dda-bb7f-7bcfd0992479>, <urn:travharv:assertionresult-084e333a-9ce3-461f-8726-ca3945037e65>;
.

<urn:travharv:assertionresult-6cbe2989-5e08-4dda-bb7f-7bcfd0992479>
    a prov:Entity, sh:ValidationResult ; 

    prov:generatedAtTime "2024-05-17 12:34:49.055321+00:00" ; 

    sh:focusNode "http://marineregions.org/mrgid/63523"^^xsd:dateTime ; 
    sh:resultPath "<http://marineregions.org/ns/ontology#hasGeometry>"^^xsd:string ; 
    sh:resultMessage "Assertion failed, last path: <http://marineregions.org/ns/ontology#hasGeometry>"^^xsd:string ;
    travharv:lastRetrievableResource <urn:travharv:assertionresource-d356cccf-820d-4d26-8174-48b301f39e61> ;  
    .

<urn:travharv:assertionresource-d356cccf-820d-4d26-8174-48b301f39e61>
    a schema:DataDownload, void:Dataset ; 
    schema:contentUrl "http://marineregions.org/mrgid/63523" ; 
    schema:encodingFormat "application/ld+json" ; 
    void:triples "12"^^xsd:integer ;
.
<urn:travharv:assertionresult-084e333a-9ce3-461f-8726-ca3945037e65>
    a prov:Entity, sh:ValidationResult ; 

    prov:generatedAtTime "2024-05-17 12:34:50.870950+00:00" ; 

    sh:focusNode "http://marineregions.org/mrgid/63523"^^xsd:dateTime ; 
    sh:resultPath "<http://marineregions.org/ns/ontology#isPartOf>/<https://schema.org/geo>/<https://schema.org/latitude>"^^xsd:string ; 
    sh:resultMessage "Assertion failed, last path: <http://marineregions.org/ns/ontology#isPartOf>/<https://schema.org/geo>/<https://schema.org/latitude>"^^xsd:string ;
    travharv:lastRetrievableResource <urn:travharv:assertionresource-efef8479-0e46-4f71-a698-834584710aca> ;  
    .

<urn:travharv:assertionresource-efef8479-0e46-4f71-a698-834584710aca>
    a schema:DataDownload, void:Dataset ; 
    schema:contentUrl "http://marineregions.org/mrgid/17595" ; 
    schema:encodingFormat "application/ld+json" ; 
    void:triples "34"^^xsd:integer ;
.

Noticed that sh:focusNode had the wrong xsd:format linked to it , this was already changed in a follow up push

cedricdcc commented 6 months ago

Currently the test_service.py:test_travharv is failing due to the config path still having the full local path as config. This will be fixed in a next push and the pull request title and description will be adjusted accordingly

marc-portier commented 6 months ago

some remarks on the

A small example of how an execution_report looks now


<urn:travharv:dbd467e6-3a6d-4f48-a4b7-ba5ab3dec81b> 
    a prov:Entity, sh:ValidationReport ;
    travharv:fromContext "base_test.yml" ;
    sh:result <urn:travharv:assertionresult-6cbe2989-5e08-4dda-bb7f-7bcfd0992479>, <urn:travharv:assertionresult-084e333a-9ce3-461f-8726-ca3945037e65>;
.
  1. not asserting the expected shape of things and use vocabs, I trust @laurianvm checks those details + we might consider an issue about creating a shacl description that we make available and use to validate even in pytest scenarios ?
  2. regarding human readability though I am questioning if we could slightly tune and optimise these uuid identifiers?

observations on current state:

proposal:

none of these are blocking / essential - so ok if we move them to later improvement too