netwerk-digitaal-erfgoed / dataset-register

Components (API and crawler) for the NDE Dataset Register
https://datasetregister.netwerkdigitaalerfgoed.nl/api/
European Union Public License 1.2
4 stars 3 forks source link

suspicious 406 on validate #69

Closed coret closed 3 years ago

coret commented 3 years ago
# curl -i -X PUT https://demo.netwerkdigitaalerfgoed.nl/register-api/datasets/validate \
   -H 'link: <http://www.w3.org/ns/ldp#RDFSource>; rel="type",<http://www.w3.org/ns/ldp#Resource>; rel="type"' \   
   -H 'content-type: application/ld+json' \
   --data-binary '{"@id":"https://data.stad.gent/explore/dataset/vondelingen/information/"}'

HTTP/2 406
date: Sat, 20 Feb 2021 00:28:44 GMT
content-type: text/plain; charset=utf-8
content-length: 45
strict-transport-security: max-age=15724800; includeSubDomains

Unexpected "\n" at position 193 in state STOP

# shacl validate --shapes shapes.jsonld --data data.jsonld

@prefix schema: <http://schema.org/> .
@prefix void:  <http://rdfs.org/ns/void#> .
@prefix dct:   <http://purl.org/dc/terms/> .
@prefix owl:   <http://www.w3.org/2002/07/owl#> .
@prefix sh:    <http://www.w3.org/ns/shacl#> .
@prefix rdf:   <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix dctype: <http://purl.org/dc/dcmitype/> .
@prefix xsd:   <http://www.w3.org/2001/XMLSchema#> .
@prefix rdfs:  <http://www.w3.org/2000/01/rdf-schema#> .
@prefix dcat:  <http://www.w3.org/ns/dcat#> .
@prefix foaf:  <http://xmlns.com/foaf/0.1/> .
@prefix dc:    <http://purl.org/dc/elements/1.1/> .

[ a            sh:ValidationReport ;
  sh:conforms  false ;
  sh:result    [ a                             sh:ValidationResult ;
                 sh:focusNode                  []  ;
                 sh:resultMessage              "minCount[1]: Invalid cardinality: expected min 1: Got count = 0" ;
                 sh:resultPath                 schema:creator ;
                 sh:resultSeverity             sh:Violation ;
                 sh:sourceConstraintComponent  sh:MinCountConstraintComponent ;
                 sh:sourceShape                []
               ]
] .
ddeboer commented 3 years ago

This is an internal Comunica error, which you can reproduce by querying that URL with Comunica directly:

comunica-sparql https://data.stad.gent/explore/dataset/vondelingen/information/  "SELECT * WHERE { ?s ?p ?o }"
[(node:20605) UnhandledPromiseRejectionWarning: Error: Unexpected "\n" at position 193 in state STOP
    at Function.newErrorCoded (/usr/local/lib/node_modules/@comunica/actor-init-sparql/node_modules/@comunica/actor-rdf-parse-html-script/lib/HtmlScriptListener.js:33:23)
    at JsonLdParser.<anonymous> (/usr/local/lib/node_modules/@comunica/actor-init-sparql/node_modules/@comunica/actor-rdf-parse-html-script/lib/HtmlScriptListener.js:86:26)
    at JsonLdParser.emit (events.js:327:22)
    at Parser.jsonParser.onError (/usr/local/lib/node_modules/@comunica/actor-init-sparql/node_modules/jsonld-streaming-parser/lib/JsonLdParser.js:370:18)
    at Parser.proto.charError (/usr/local/lib/node_modules/@comunica/actor-init-sparql/node_modules/jsonparse/jsonparse.js:90:8)
    at Parser.proto.write (/usr/local/lib/node_modules/@comunica/actor-init-sparql/node_modules/jsonparse/jsonparse.js:199:23)
    at JsonLdParser._transform (/usr/local/lib/node_modules/@comunica/actor-init-sparql/node_modules/jsonld-streaming-parser/lib/JsonLdParser.js:110:25)
    at JsonLdParser.Transform._read (internal/streams/transform.js:205:10)
    at JsonLdParser.Transform._write (internal/streams/transform.js:193:12)
    at doWrite (internal/streams/writable.js:377:12)
(Use `node --trace-warnings ...` to show where the warning was created)
(node:20605) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag `--unhandled-rejections=strict` (see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 27)
(node:20605) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.

Is Comunica tripping over the ogp properties?

npx rdf-dereference https://data.stad.gent/explore/dataset/vondelingen/information/
[
{"subject":"https://data.stad.gent/explore/dataset/vondelingen/information/","predicate":"http://ogp.me/ns#type","object":"\"website\"@en","graph":""},
{"subject":"https://data.stad.gent/explore/dataset/vondelingen/information/","predicate":"http://ogp.me/ns#title","object":"\"Vondelingen register Gent\"@en","graph":""},
{"subject":"https://data.stad.gent/explore/dataset/vondelingen/information/","predicate":"http://ogp.me/ns#description","object":"\"De registers van de vondelingen maken deel uit van \nde archieven van de Commissie van Burgerlijke Godshuizen. Elke bladzijde\n in deze registers is een soort persoonlijk dossier van de vondeling \nwaarop verschillende gegevens terug te vinden zijn: naam, \n(vermoedelijke) geboortedatum en leeftijd, vindplaats... Er stond ook \nmeestal wat er met het kind gebeurd was nadat het gevonden \nwerd: overleden, geplaatst bij een voedster, terug naar de moeder... \"@en","graph":""},
{"subject":"https://data.stad.gent/explore/dataset/vondelingen/information/","predicate":"http://ogp.me/ns#url","object":"\"https://data.stad.gent/explore/dataset/vondelingen/information/\"@en","graph":""},
{"subject":"https://data.stad.gent/explore/dataset/vondelingen/information/","predicate":"http://ogp.me/ns#image","object":"\"https://data.stad.gent/static/ods/imgv4/social-images/social_media_image_information.png\"@en","graph":""},
{"subject":"https://data.stad.gent/explore/dataset/vondelingen/information/","predicate":"http://ogp.me/ns#image:width","object":"\"200\"@en","graph":""},
{"subject":"https://data.stad.gent/explore/dataset/vondelingen/information/","predicate":"http://ogp.me/ns#image:height","object":"\"200\"@en","graph":""}(node:20635) UnhandledPromiseRejectionWarning: Error: Unexpected "\n" at position 193 in state STOP
    at Function.newErrorCoded (/Users/david/src/netwerk-digitaal-erfgoed/register/node_modules/@comunica/actor-rdf-parse-html-script/lib/HtmlScriptListener.js:33:23)
    at JsonLdParser.<anonymous> (/Users/david/src/netwerk-digitaal-erfgoed/register/node_modules/@comunica/actor-rdf-parse-html-script/lib/HtmlScriptListener.js:86:26)
    at JsonLdParser.emit (events.js:327:22)
    at Parser.jsonParser.onError (/Users/david/src/netwerk-digitaal-erfgoed/register/node_modules/jsonld-streaming-parser/lib/JsonLdParser.js:370:18)
    at Parser.proto.charError (/Users/david/src/netwerk-digitaal-erfgoed/register/node_modules/jsonparse/jsonparse.js:90:8)
    at Parser.proto.write (/Users/david/src/netwerk-digitaal-erfgoed/register/node_modules/jsonparse/jsonparse.js:199:23)
    at JsonLdParser._transform (/Users/david/src/netwerk-digitaal-erfgoed/register/node_modules/jsonld-streaming-parser/lib/JsonLdParser.js:110:25)
    at JsonLdParser.Transform._read (internal/streams/transform.js:205:10)
    at JsonLdParser.Transform._write (internal/streams/transform.js:193:12)
    at doWrite (internal/streams/writable.js:377:12)
(Use `node --trace-warnings ...` to show where the warning was created)
(node:20635) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag `--unhandled-rejections=strict` (see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 10)
(node:20635) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.

This is not something we can fix on our side, so please report at Comunica.

coret commented 3 years ago

Done https://github.com/comunica/comunica/issues/785

coret commented 3 years ago

(Deel van de ) reactie van de Technisch applicatiespecialist Web van District09:

Correct, blijkbaar parsed ons portaal die newlines er niet uit voor de JSON-LD info. Bedankt om dit door te geven, ik neem het even op met de leverancier om dit te bekijken. Ondertussen heb ik voor deze set de omschrijving aangepast en alle newlines er uit gehaald.

Dus, de JSON-LD van de dataset is valide qua syntaxis (inhoudelijk mist er een creator), echter de API geeft een 406 (zonder foutmelding)?

ddeboer commented 3 years ago

406 means no dataset found, which is correct because the JSON-LD contains no IRI (@id):

 {
        "@context":"http://schema.org/",
        "@type":"Dataset",
        "name":"Vondelingen register Gent",
        "description":"De registers van de vondelingen maken deel uit van de archieven van de Commissie van Burgerlijke Godshuizen. Elke bladzijde in deze registers is een soort persoonlijk dossier van de vondeling waarop verschillende gegevens terug te vinden zijn: naam, (vermoedelijke) geboortedatum en leeftijd, vindplaats... Er stond ook meestal wat er met het kind gebeurd was nadat het gevonden werd: overleden, geplaatst bij een voedster, terug naar de moeder... ",
        "url":"https://data.stad.gent/explore/dataset/vondelingen/",
        "dateModified": "2020-06-15T15:49:53.187480+00:00"
        ,
        "keywords": ["vondelingen"]

    ,
        "distribution": [

            {
                "@type":"DataDownload",
                "encodingFormat":"CSV",
                "contentUrl":"https://data.stad.gent/explore/dataset/vondelingen/download?format=csv"
            },

            {
                "@type":"DataDownload",
                "encodingFormat":"JSON",
                "contentUrl":"https://data.stad.gent/explore/dataset/vondelingen/download?format=json"
            },

            {
                "@type":"DataDownload",
                "encodingFormat":"Excel",
                "contentUrl":"https://data.stad.gent/explore/dataset/vondelingen/download?format=xls"
            }

        ]

    ,
    "license": "https://overheid.vlaanderen.be/modellicentie-gratis-hergebruik"

    }

I do see url but that is just a predicate and not the same as @id (which becomes the subject of the other triples).

coret commented 3 years ago

We (now :-) require an IRI, but the https://data.stad.gent/explore/dataset/vondelingen/information/ does contain a dataset, it's just that one of our requirements is not met.

Why say it's not valid because schema:creator is missing but say "no dataset found" when a IRI is missing? If we also add the IRI requirement to the SHACL, will we then get a validation error for the missing IRI?

ddeboer commented 3 years ago

I don’t think you can add the IRI requirement to the SHACL (because it’s about the subject, not about the predicates/objects), but you can try.

but say "no dataset found" when a IRI is missing?

We also return this when we cannot find a dataset at all. In fact, we cannot find a dataset here, because the JSON-LD properties have no subject except for some blank node.