Open jamsden opened 4 years ago
I did some more investigating, with debug, and found that the errors are associated with dcterms references when checking resource shapes.
The typical error is: Error on http://purl.org/dc/terms/source: The target resource cannot be fetched or parsed as RDF. (bad value org.apache.jena.riot.RiotException: [line: 2, col: 36] {E202} Expecting XML start or end element(s). String data "308 Permanent Redirect" not allowed. Maybe there should be an rdf:parseType='Literal' for embedding mixed XML content in RDF. Maybe a striping error.)
when processing an oslc:Property with oslc:propertyDefinition dcterms:source.
All the references to dcterms properties do this.
Using debug shows: Parsing https://www.dublincore.org/2012/06/14/dcterms.rdf#source [main] WARN org.apache.jena.riot - [line: 1, col: 7 ] {W104} Unqualified typed nodes are not allowed. Type treated as a relative URI. [main] WARN org.apache.jena.riot - [line: 1, col: 7 ] {W136} Relative URIs are not permitted in RDF: specifically [main] WARN org.apache.jena.riot - [line: 2, col: 7 ] {W104} Unqualified property elements are not allowed. Treated as a relative URI. [main] WARN org.apache.jena.riot - [line: 2, col: 7 ] {W136} Relative URIs are not permitted in RDF: specifically
[main] WARN org.apache.jena.riot - [line: 2, col: 14] {W104} Unqualified typed nodes are not allowed. Type treated as a relative URI. [main] WARN org.apache.jena.riot - [line: 2, col: 14] {W136} Relative URIs are not permitted in RDF: specificallySo it appears that http:/purl.org/dc/terms/ is being redirected somehow to https://www.dublincore.org/2012/06/14/dcterms.rdf. That resource does exist, but accessing with no Accept header returns HTML while accessing with Accept=text/turtle gives RDF.
Maybe Dublin Core has changed how they handle redirects and accept header?
Yes, Dublin Core has strange redirects and non-standard handling of content negotiation. ShapeChecker already has a work-around in place for that, including doing the explicit redirection you mention. It appears that is not working for you. To avoid the issue, at least temporarily, add the command line option
-x 'https?://purl.org/dc/terms.*'
That would need to be done in the CircleCI build in order for the pull request to pass its checks.
And that might miss some errors if the dcterms references are incorrect.
is that -x above a regular expression?
This issue has been fixed. Users of ShapeChecker should remove uses of the workaround using -x to suppress loading of Dublin Core.
@ndjc I am hitting this problem again. I don't think DCTerms vocab can be fetched in Turtle any more from PURL. Could you please point me to your fix? For now I am bringing back -x in some scripts.
See this code in HttpHandler:
// Seems like Jena has a bug of ignoring the RDFParserBuilder Accept header,
// and Dublin Core uses an arcane set of redirects including 308, not handled by Apache by default,
// so we need to configure our HttpClient very carefully!
Header rdfHeader = new BasicHeader(HttpHeaders.ACCEPT, RDF_CONTENT_TYPES);
HttpClientBuilder builder = HttpClientBuilder
.create()
.setRedirectStrategy(redirect308())
.setDefaultHeaders(Collections.singletonList(rdfHeader))
.addInterceptorFirst((HttpRequestInterceptor) (request, context) -> request.addHeader(HttpHeaders.ACCEPT, RDF_CONTENT_TYPES));
and look at the redirect308() method.
Note that you can run with debug levels > 2 (-D -D -D) to get more info about the http requests being sent and responses returned.
Thanks Nick! My plan is to intercept calls to the URIs listen on this page and fetch the Turtle from a completely different location: https://www.dublincore.org/schemas/rdfs/
Notably, we will fetch the Turtle representation for the http://purl.org/dc/terms/
namespace from https://www.dublincore.org/specifications/dublin-core/dcmi-terms/dublin_core_terms.ttl
To expand why, the original URI no longer supports conneg and seems to serve HTML no matter what, causing "RiotException: Triples not terminated by DOT". I will reply back here if I find a less intrusive workround.
Here is the response for posterity:
<html>
<head>
<title>INetSim default HTML page</title>
</head>
<body>
<p></p>
<p align="center">This is the default HTML page for INetSim HTTP server fake mode.</p>
<p align="center">This file is an HTML document.</p>
</body>
</html>
Request export from Postman:
curl --location --request GET 'http://purl.org/dc/terms/' \
--header 'Accept: text/turtle;q=1.0,application/rdf+xml;q=0.9,application/n-triples;q=0.8,application/ld+json;q=0.3'
Suggestions from https://gitter.im/linkeddata/chat:
I've submitted a pull request for oslc_am and core changes: https://github.com/oslc-op/oslc-specs/pull/317 that is failing with these errors: see: https://app.circleci.com/pipelines/github/oslc-op/oslc-specs/309/workflows/a473724f-3155-447c-b668-ff4c061bb2ef/jobs/305
!/bin/bash -eo pipefail
cd tools/ShapeChecker && ./check-cm.sh [main] WARN org.apache.jena.riot - [line: 1, col: 7 ] {W104} Unqualified typed nodes are not allowed. Type treated as a relative URI. [main] WARN org.apache.jena.riot - [line: 1, col: 7 ] {W136} Relative URIs are not permitted in RDF: specifically [main] WARN org.apache.jena.riot - [line: 2, col: 7 ] {W104} Unqualified property elements are not allowed. Treated as a relative URI. [main] WARN org.apache.jena.riot - [line: 2, col: 7 ] {W136} Relative URIs are not permitted in RDF: specifically
[main] WARN org.apache.jena.riot - [line: 2, col: 14] {W104} Unqualified typed nodes are not allowed. Type treated as a relative URI. [main] WARN org.apache.jena.riot - [line: 2, col: 14] {W136} Relative URIs are not permitted in RDF: specificallyThis looks like an attempt to read HTML source as RDF source.
check-cm.sh:
build/install/ShapeChecker/bin/ShapeChecker \ -x http://open-services.net/ns/core ${comment# See https://github.com/oslc-op/oslc-specs/issues/40} \ -x http://open-services.net/ns/cm ${comment# See https://github.com/oslc-op/oslc-specs/issues/40} \ -v ../../specs/core/vocab/core-vocab.ttl \ -v ../../specs/cm/change-mgt-vocab.ttl \ -s ../../specs/cm/change-mgt-shapes.ttl
These .ttl files look ok. I see core doesn't have @base, but change-mgt-vocab.ttl does, and its old. Should these be removed?