w3c / csvw

Documents produced by the CSV on the Web Working Group
Other
161 stars 57 forks source link

CSV on the Web metadata files are not valid JSON-LD #849

Open Conal-Tuohy opened 7 years ago

Conal-Tuohy commented 7 years ago

At https://www.w3.org/ns/csvw#datatype-definitions, the URI Template data type used for example by the valueUrl property is defined to be a subclass of xsd:anyURI

csvw:uriTemplate a rdfs:Datatype;
  rdfs:label "uri template"@en;
  rdfs:comment """"""@en;
  rdfs:subClassOf xsd:anyURI;
  rdfs:isDefinedBy csvw: .

However, URI Templates can contain { and } characters which are not syntactically valid in various places in a URI. I believe this rdfs:subClassOf statement is mistaken and renders some csvw annotations which use URI templates syntactically invalid.

I discovered this while attempting to deposit a csvw metadata file in JSON-LD format into a Fedora repository, which attempted to parse it as RDF, and gave me the following stack trace:

java.lang.IllegalArgumentException: Illegal character in path at index 7: medium-{photographic_media_type}
    at java.net.URI.create(URI.java:852)
    at java.net.URI.resolve(URI.java:1036)
    at com.github.jsonldjava.utils.JsonLdUrl.resolve(JsonLdUrl.java:274)
    at com.github.jsonldjava.core.Context.expandIri(Context.java:538)
    at com.github.jsonldjava.core.Context.expandValue(Context.java:1099)
    at com.github.jsonldjava.core.JsonLdApi.expand(JsonLdApi.java:979)
    at com.github.jsonldjava.core.JsonLdApi.expand(JsonLdApi.java:819)
    at com.github.jsonldjava.core.JsonLdApi.expand(JsonLdApi.java:517)
    at com.github.jsonldjava.core.JsonLdApi.expand(JsonLdApi.java:819)
    at com.github.jsonldjava.core.JsonLdApi.expand(JsonLdApi.java:819)
    at com.github.jsonldjava.core.JsonLdApi.expand(JsonLdApi.java:997)
    at com.github.jsonldjava.core.JsonLdProcessor.expand(JsonLdProcessor.java:146)
    at com.github.jsonldjava.core.JsonLdProcessor.toRDF(JsonLdProcessor.java:482)
    at org.apache.jena.riot.lang.JsonLDReader.read$(JsonLDReader.java:143)
    at org.apache.jena.riot.lang.JsonLDReader.read(JsonLDReader.java:83)
    at org.apache.jena.riot.RDFDataMgr.process(RDFDataMgr.java:859)
    at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:259)
    at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:245)
    at org.apache.jena.riot.adapters.RDFReaderRIOT.read(RDFReaderRIOT.java:69)
    at org.apache.jena.rdf.model.impl.ModelCom.read(ModelCom.java:305)
    at org.fcrepo.http.api.ContentExposingResource.replaceResourceWithStream(ContentExposingResource.java:627)
    at org.fcrepo.http.api.FedoraLdp.createOrReplaceObjectRdf(FedoraLdp.java:364)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81)
    at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:144)
    at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:161)
    at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$ResponseOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:160)
    at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:99)
    at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:389)
    at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:347)
    at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:102)
    at org.glassfish.jersey.server.ServerRuntime$2.run(ServerRuntime.java:326)
    at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271)
    at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267)
    at org.glassfish.jersey.internal.Errors.process(Errors.java:315)
    at org.glassfish.jersey.internal.Errors.process(Errors.java:297)
    at org.glassfish.jersey.internal.Errors.process(Errors.java:267)
    at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:317)
    at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:305)
    at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1154)
    at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:473)
    at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:427)
    at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:388)
    at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:341)
    at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:228)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:292)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:207)
    at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:240)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:207)
    at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
    at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:106)
    at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:502)
    at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:141)
    at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:79)
    at org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:616)
    at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:88)
    at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:522)
    at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1095)
    at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:672)
    at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1504)
    at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1460)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.URISyntaxException: Illegal character in path at index 7: medium-{photographic_media_type}
    at java.net.URI$Parser.fail(URI.java:2848)
    at java.net.URI$Parser.checkChars(URI.java:3021)
    at java.net.URI$Parser.parseHierarchical(URI.java:3105)
    at java.net.URI$Parser.parse(URI.java:3063)
    at java.net.URI.<init>(URI.java:588)
    at java.net.URI.create(URI.java:850)
    ... 68 more
Conal-Tuohy commented 7 years ago

A bit more investigation revealed the problem is actually that the properties aboutUrl, propertyUrl, and valueUrl, are declared to have a @type of @id in the CSVW JSON-LD context file. In RDF terms this makes them an object property and requires that the property values are valid URIs, though in fact they are in general not valid URIs because they generally contain URI Template markup such as {, :, } and other reserved characters.

By patching the csvw context to change these properties to have a @type of xsd:string, and changing my CSV metadata file to refer to that patched context instead of https://www.w3.org/ns/csvw, I was able to make my metadata file into valid JSON-LD, and I was still able to use it with CSV2RDF software to interpret CSV.

    "aboutUrl": {
      "@id": "csvw:aboutUrl",
      "@type": "xsd:string"
    },
    "propertyUrl": {
      "@id": "csvw:propertyUrl",
      "@type": "xsd:string"
    },
    "valueUrl": {
      "@id": "csvw:valueUrl",
      "@type": "xsd:string"
    }
gkellogg commented 7 years ago

@Conal-Tuohy Yes, you're right; not sure how we missed this. It would seem that changing the context and RDFS definitions for URI Template Properties would do the trick, as I don't see any specific references to this in the metadata document, itself. However, I'd like to be sure that doing this doesn't break something else.

gkellogg commented 7 years ago

@iherman I'm not sure this rises to the level of an Erratum, as no recommendation will change, just the context and RDFS definitions, which aren't normative. Still, no harm in adding this to the Erratum document.

A previous version of the JSON-LD Context erroneously defined csvw:uriTemplate as being a subclass of xsd:anyURI, and specified the @type of aboutUrl, propertyUrl, and valueUrl as being @id. As a URI Template includes the characters { and }, which are not valid in a URI, the context has been changed to change the subclass of csvw:uriTemplate to xsd:string, and defined the @type of affected properties to csvw:uriTemplate. This should have no affect on CSVW Processors which treat CSVW Metadata documents as JSON, rather than RDF.

Alternatively, we could eliminate csvw:uriTemplate, and just make the values xsd:string, but it really shouldn't affect processors in any case.

Conal-Tuohy commented 7 years ago

@gkellogg good to check, of course, but I don't expect changing the ontology will break anything; my guess is that existing implementations of the CSV2RDF spec are ignoring the RDF semantics and processing the metadata files as JSON. Otherwise someone else would surely have raised my issue already.

I think it would be good to retain the data type csvw:uriTemplate (rather than just using xsd:string, as I did, above) so as to be able to constrain the lexical space of URI templates with a regex. This could catch unmatched { and } characters, at least.

iherman commented 7 years ago

@gkellogg I think it is good to record this as an erratum; it also gives a stronger historical point why the context and ontology files have changed.

Conal-Tuohy commented 7 years ago

@gkellogg in your proposed erratum, the second occurrence of "subclass" should be "superclass". Cheers!

iherman commented 7 years ago

Summary: in the ontology, the csvw:uriTemplate (data)type is defined as a subclass of xsd:anyURI, although the text says that URI patterns can also be used. This is a bug in the context file and the ontology (not in the written recommendations, though). The ontology files have been modified to refer to xsd:string instead.

The files have been updated on the W3C site, see the PR https://github.com/w3c/csvw/issues/850.

Conal-Tuohy commented 7 years ago

Thanks for the very speedy fix

gkellogg commented 7 years ago

@iherman I think we can close this now, if not, it's in your court.

iherman commented 7 years ago

I left it open so that it appears in the errata document.

Ivan

On 7 Jun 2017, at 17:05, Gregg Kellogg <notifications@github.com mailto:notifications@github.com> wrote:

@iherman https://github.com/iherman I think we can close this now, if not, it's in your court.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/w3c/csvw/issues/849#issuecomment-306823651, or mute the thread https://github.com/notifications/unsubscribe-auth/AAfyE4_t4t_-mDrlCLaIWg740fmJkzcsks5sBrwigaJpZM4Ntsvj.

https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png https://github.com/w3c/csvw https://github.com/w3c/csvw/issues/849#issuecomment-306823651


Ivan Herman, W3C Publishing@W3C Technical Lead Home: http://www.w3.org/People/Ivan/ http://www.w3.org/People/Ivan/ mobile: +31-641044153 ORCID ID: http://orcid.org/0000-0003-0782-2704 http://orcid.org/0000-0003-0782-2704