w3c / json-ld-syntax

JSON-LD 1.1 Specification
https://w3c.github.io/json-ld-syntax/
Other
111 stars 22 forks source link

Clarification: prefix definition in @context using relative IRI? #363

Closed alexkreidler closed 3 years ago

alexkreidler commented 3 years ago

I have a document that looks like this at /home:

{
    "@context": "/context",
    "id": "/home",
    "title": "Example People API Server",
    "description": "Using the people Hydra ontology",
    "people": {
        "@id": "/people"
    }
}

That fetches the following at /context

{
        "@context": [
            {
                "ppl": "/onto/",
                "schema": "http://schema.org/",
                "hydra": "http://www.w3.org/ns/hydra/core#",
                "people": {
                    "@id": "ppl:people",
                    "@type": "@id"
                }
            },
            "https://www.w3.org/ns/hydra/context.jsonld"
        ]
}

I then got an error from the jsonld library: Invalid JSON-LD syntax; a @context @id value must be an absolute IRI, a blank node identifier, or a keyword..

@context itself can be a remote resource as illustrated in the first example.

I did see this note:

Properties, values of @type, and values of properties with a term definition that defines them as being relative to the vocabulary mapping, may have the form of a relative IRI reference, but are resolved using the vocabulary mapping, and not the base IRI.

Thus it makes sense that there might be an error if trying to use the ppl prefix in a property, like ppl:knows.

I looked through the spec though and couldn't find any explicit answer to whether relative prefixes are allowed in the context, so I'm hoping one of you here could answer that.

It likely would be a helpful option for those serving ontologies from test/local servers.

pchampin commented 3 years ago

The paragraph you quote means that properties (among other things) are "resolved" against @vocab instead of @base. You have a default @base in your context (the URL from which it was retrieved) but no @vocab, hence the error. Add a @vocab and the error disappears. And actually the @vocab can be relative (to the @base, either implicit or explicit), so you can use "@vocab"="" to force ppl to resolve agains the base. Well, sort of!...

Sort of, because in JSON-LD, "resolved using the vocabulary mapping" (i.e. using @vocab) works differently from resolving using a base IRI: you simply concatenate the "relative" term to the @vocab IRI. See in this example how the ../ in ppl is kept in the expanded IRI. So that does not really solve your problem, I'm afraid.

Then let me try to convince you that this is no really a problem :wink:

It likely would be a helpful option for those serving ontologies from test/local servers.

In Linked Data, IRIs are identifiers (1st principle) before being addresses (2nd principle). Being a shared vocabulary, an ontology (and all its terms) must have the same IRI everywhere they are used, even on a test server. So my advice is to decide on an absolute IRI where your ontolgy will be published, and use that absolute IRI in your context, even while testing your API on localhost. The fact that, during tests, these IRIs are not consistent with the location where the ontology is served (e.g. localhost/onto) , is secondary – and not really a problem.

Think about it this way: the client consuming your JSON-LD will look for a property it knows. If it consumes the JSON directly (trusting the context), it will look for the people attribute. If it first expands it to an RDF graph, it will look for an arc with some absolute IRI that it knows in advance from the shared vocabulary. Otherwise, how could your client code know that http://localhost:1234/test/onto/people, http://localhost:5678/onto/people and `http://example.org/api/onto/people have the same semantics? This would require tight coupling between the client and the service, which something that Linked Data aims to avoid.

alexkreidler commented 3 years ago

Thanks a lot for the great explanation and examples. I'll close the issue.

The following are some details about my specific situation/a response to the second piece.

I totally recognize that for people building ontologies for others to consume that appropriate identifiers are key and should be the same across environments, even in testing.

In fact, for my example ontology, since it relies on schema.org, which recently dropped builtin support for content negotiation, I'm overriding DNS using /etc/hosts entry pointing schema.org to localhost, where I'm running a custom server serving the ontology. The server is a modified version of actix-web with content negotiation (https://github.com/actix/actix-web/pull/1719). This allows clients to still use and understand schema.org as if it were the same, but makes the client experience better. Hopefully at some point I can help schema.org figure out a way to bring back content negotiation to their actual site.

However, my ontologies are primarily for testing software I'm working on for frontend developers working with linked data. Additionally, the software itself is pretty generic, and will try to dereference and fetch some vary basic things like rdfs:label, etc, so it doesn't depend on the details of the ontologies. Therefore it doesn't really make a difference whether it is using http://localhost:1234/test/onto/people or http://localhost:5678/onto/people. However it is unlikely I'd need two servers for the same ontology. It's more likely I'd end up with something like http://localhost:1234/test/onto/people and http://localhost:5678/onto/events. Who knows, I might end up serving them all from the same server.

It's also just partly convenience: being able to say serve on localhost:4000 instead of localhost:9090, for example, with out having to replace the ID in multiple places. For example, I've been using json-server to serve the data: here, and it's more convenient to use relative IRIs because like I mentioned I can change which port I'm serving on.

I guess for my situation there are a few options:

  1. use relative IRIs using the solution you mentioned above, but accept their limitations
  2. be comfortable using a bunch of absolute IRIs that look like http://localhost:<someport>/something
  3. use fake IRIs like http://ontology.nosuchdomain or even semi real ones (e.g. a domain I could publish to) that I haven't published yet. Since I've rarely seen production IRIs with ports, I would add a proxy like Nginx to listen on port 80 and use the Host header to route to various localhost: locations. This is similar to what I'm doing with schema.org right now, but would add a bit more complexity if I need more ontology servers to manage the nginx config.

Maybe there'd even be another option in another version of JSON-LD, something like @replace, a map of absolute IRIs to other absolute IRIs, which expands the document and does the replacements. This would allow people to use something like http://unpublished.my.ontology.domain/ns/ for IRIs and replace it with localhost:9000 in different environments. This would have to only happen at the very end of other processing, and thus there could only be one @replace in however many contexts are linked. It could add a lot of potential complexity for compacting replaced documents, etc.

Maybe that's not even something for JSON-LD to deal with, rather to do using RDF libraries. I think type of feature is slightly related to some discussions I've seen in the schema.org repositories about providing backup URLs for remote context documents https://github.com/schemaorg/schemaorg/issues/2578#issuecomment-698757364. I think I've seen some related things here: https://github.com/w3c/json-ld-syntax/issues/108#issuecomment-447629312, https://github.com/w3c/json-ld-syntax/issues/9.

I think there is a need to deal with the reliability of remote contexts, but I'm not sure if anything above would work.

gkellogg commented 3 years ago

In fact, for my example ontology, since it relies on schema.org, which recently dropped builtin support for content negotiation, I'm overriding DNS using /etc/hosts entry pointing schema.org to localhost, where I'm running a custom server serving the ontology. The server is a modified version of actix-web with content negotiation (actix/actix-web#1719). This allows clients to still use and understand schema.org as if it were the same, but makes the client experience better. Hopefully at some point I can help schema.org figure out a way to bring back content negotiation to their actual site.

schema.org replaced content-negotiation with using the rel=alternate link header to point to the context. This is provided for in the spec, and conforming implementations will retrieve the document from that location. You might see if the processor your using conforms to JSON-LD 1.1.

However, my ontologies are primarily for testing software I'm working on for frontend developers working with linked data. Additionally, the software itself is pretty generic, and will try to dereference and fetch some vary basic things like rdfs:label, etc, so it doesn't depend on the details of the ontologies. Therefore it doesn't really make a difference whether it is using http://localhost:1234/test/onto/people or http://localhost:5678/onto/people. However it is unlikely I'd need two servers for the same ontology. It's more likely I'd end up with something like http://localhost:1234/test/onto/people and http://localhost:5678/onto/events. Who knows, I might end up serving them all from the same server.

For testing purposes, some tests will set "@vocab": "http://example.org/", which at least allows for proper expansion of relative vocabulary terms. There's also a provision to make the vocabulary document relative (e.g., "@vocab": "#"). Generally, prefer to use a well-defined ontology, but it's not uncommon to use document-relative URIs for properties.

alexkreidler commented 3 years ago

@gkellogg Yeah the processors are compliant and they do get sent to the appropriate context location.

However the context only defines terms in the vocabulary, it doesn't include any actual triples about classes or properties (e.g. labels and descriptions, which I'm interested in).

There was a feature a added to schema.org a while back that supported content negotiation to actually send different representations in various RDF serializations of each class and property. E.g. for https://schema.org/Person, it would return all the triples of the form schema:Person ?p ?o, and ?pred ?p ?o where ?pred schema:domainIncludes schema:Person, etc and so on, basically providing all the information listed on the webpage in machine-readable format.

I've built a simple version of that here, which doesn't support the subClassOf relation yet, or schema:rangeIncludes as I was getting some weird cyclic bug when I tried the latter one.

gkellogg commented 3 years ago

Okay, not familiar with that aspect of schema.org, but you might try checking on their forum.

I do know that the publish the vocabulary (RDFS equivalent) in a number of different formats, but their naming has changed in version 1.0. You can try https://schema.org/version/latest/schemaorg-all-http.jsonld for example.