phenopackets / phenopacket-format

26 stars 10 forks source link

JSON-LD context syntax, precedence #65

Closed balhoff closed 8 years ago

balhoff commented 8 years ago

Per issue #40, we want to use a JSON-LD context file to specify how CURIEs used in phenopackets should be expanded. I have mostly implemented handling of contexts for the reference implementation, but I have a few questions about how they should work.

  1. Do we want mappings in the local (file-embedded) context to take precedence over mappings provided by the global default context? This is what the wiki says, and I think that is probably the most expected behavior. But we could increase consistency across files if we say that the default context can only be extended, not overridden.
  2. The wiki suggests specifying a document base for relative IRIs in the context using the base key. In JSON-LD this should be @base. Is a phenopacket document meant to completely adhere to JSON-LD, or can we have some special handling to make the base key without the @ work the way we want?
  3. Following on the previous point, my experiments with jsonld-api and using the JSON-LD Playground suggest that @base is ignored in JSON-LD when there is a remote context (e.g. the phenopackets default context). If we want we can specially extract the value of the base key out of the local context and apply it in our identifier expansions, but that would be beyond the typical JSON-LD processing. Perhaps we should move the base key to the root of the phenopacket document and out of the JSON-LD context. I think we are already going a little beyond JSON-LD by using an "implicit" default context within the reference API (although this is a little bit like providing an implicit context via HTTP headers, which is legitimate in certain situations).
cmungall commented 8 years ago
  1. I think there should be an option to provide a completely fresh context. Here the behavior is the same as supplying the context file yourself. But if you don't explicitly do this, then any mappings you provide should never override (with an error or warning if you try too)

Need to think about 2+3

jmcmurry commented 8 years ago
  1. ... "could increase consistency across files if we say that the default context can only be extended, not overridden"

Agreed, but if we do this, it is not entirely without nuance:

Ideally overriding mappings should be forbidden in both directions (no permuted prefixes for existing URIs, no alternate URIs for existing prefixes). However, alternate prefixes that expand to the same URI are trivial to collapse; inverse not so much. There it is necessary to determine whether the URIs actually are "equivalent", and equivalence is often a) in the eye of the beholder and b) not something that can be automated with ease.

For now, though, let's consider just being rigorous and we can deal with messy reality when we have other issues in hand. We will allow people to file-embed a brand new binding (eg. for a new database) for which neither the prefix nor URI collides with an existing one in the master context. The validator should be able to check for such collisions, including ideally, warnings for at least a few of the most common URL permutations.

It would be great if the validator could create a record of any extensions so that these could be added to the canon somehow.

balhoff commented 8 years ago

I suppose another approach worth considering is to make all phenopackets real JSON-LD and always explicitly specify the context (so the behavior is determined by the JSON-LD spec). We can still provide the default context which folks can reference if they want (you can have a list for your context including both external URIs and embedded objects).

jmcmurry commented 8 years ago

Are you suggesting that, for safety, the validation step harvest and copy all of the relevant mappings from the master to the individual doc?

cmungall commented 8 years ago

Good suggestion.

Not sure how much jackson tweaking would be required here. The context object is a freeform map object, I assume there is a way to map that appropriately

balhoff commented 8 years ago

No, actually I was just raising the possibility of not providing an implicit context at all. Simply say that context is specified according to the JSON-LD spec. And we could provide standard context online and recommend its use. In normal JSON-LD a file can reference an external context at the same time they provide an embedded context, like so:

{
"@context": [
"http://phenopackets.org/context.jsonld",
{ "UBERON": "http://purl.obolibrary.org/obo/UBERON_"}
] 
}

Here an UBERON prefix is added to the standard context. But standard context is explicitly referenced.

cmungall commented 8 years ago

On 11 May 2016, at 15:05, Julie McMurry wrote:

Are you suggesting that, for safety, the validation step harvest and copy all of the relevant mappings from the master to the individual doc?

The context object could just be a pointer to a context file on the web

jmcmurry commented 8 years ago

Great; just wanted to make sure we weren't bloating the files unnecessarily.

balhoff commented 8 years ago

Okay, it seems like we have concluded:

  1. Phenopacket files should be real JSON-LD files, so the context should just work as specified by JSON-LD. This means the key will need to be @context instead of context.
  2. Since the provided phenopackets "default context" will be processed via normal JSON-LD machinery, we won't stop embedded contexts from redefining "standard" prefixes. But our tools can emit warnings when this is detected.
  3. We need to revisit how unprefixed identifiers are described on the wiki to make sure it's in line with how JSON-LD @base works.
balhoff commented 8 years ago

@cmungall @jmcmurry I have resolved most of the concerns I had about how @base is applied. I was fairly confused because the spec is not that clear about how it works, and, as I've since confirmed, there is a bug in JSON-LD Playground which incorrectly applies @base from external contexts, and there is a bug (now fixed) in jsonld-java which incorrectly doesn't apply @base from embedded (or external) contexts when an external context is also in use.

I'll update the description of context and base in the wiki.

jmcmurry commented 8 years ago

Wow, thanks! This must have been very painful to debug.

bug in JSON-LD Playground

Has this been reported? Is it truly limited to the playground or are there additional concerns with the specification itself?

balhoff commented 8 years ago

According to one of the JSON-LD developers it should be a bug in the JavaScript implementation. I filed a bug: https://github.com/digitalbazaar/jsonld.js/issues/142

jmcmurry commented 8 years ago

Awesome; thanks for tracking this down, Jim