w3c / dxwg

Data Catalog Vocabulary (DCAT)
https://w3c.github.io/dxwg/dcat/
Other
144 stars 55 forks source link

JSON-LD context missing as a canonical resource and duplicated in-line in examples #1428

Open rob-metalinkage opened 2 years ago

rob-metalinkage commented 2 years ago

Multiple JSON-LD examples are shown where context information is inline with examples. Surely the point of a standard vocabulary is to allow reuse of common implementation resources - such as a context document for the vocabulary that can be simply referenced?

The fact canonical contexts are missing for other key vocabularies doesnt mean that new vocabularies should be equally as awkward to use.

(hopefully a retrofitting exercise for other vocabularies is possible, - but the publication of DCAT updates can start improving this situation.

There are two solid engineering justifications (beyond ease of use): 1) such contexts can be safely cached, whereas in-line contexts mean a lot of redundant information being passed and parsed. 2) users can tell that the context is used without modifications.

dr-shorthair commented 2 years ago

Good plan. I watched Dan Brickley @danbri trying to 'import' standard contexts in a meeting about 4 years ago. It wasn't clear to me at that time if there was a standard notation to achieve it. Has practice settled down now?

danbri commented 2 years ago

It may have been this experiment?

https://docs.google.com/document/d/16c_STDu8Dzj-ioRNuGS2tlIFJamlx0-vRKBaPA5Wzfc/edit?usp=drivesdk

On Fri, 19 Nov 2021, 05:23 Simon Cox, @.***> wrote:

Good plan. I watched Dan Brickley @danbri https://github.com/danbri trying to 'import' standard contexts in a meeting about 4 years ago. It wasn't clear to me at that time if there was a standard notation to achieve it. Has practice settled down now?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/w3c/dxwg/issues/1428#issuecomment-973764807, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABJSGO6QXFLG3XFCLWM4VLUMXNMRANCNFSM5ILISXEA .

rob-metalinkage commented 2 years ago

Thanks for the link Dan - I think it is addressing a more complex problem around the interaction of syntax with multiple alternatives - whereas I was just talking about the basic pattern of referencing a context by URL:

{ "@context = [ "http://my.org/context" , "http://your.org/context" ] }

If these context files map default prefixes for vocabularys and support either "prefix:item" or "item" forms then AFAIK application profiles can simply use "item" where there is no class of term names between namespaces and "prefix:item"when disambiguation is needed.

The weird thing is I haven't really seen this written up anywhere yet it seems so obvious - and even if there was a good reason it coudlnt work I would have expected to have seen some angst about it. Maybe we've all missed it but it would be good to surface any reason this won't work... or just do it.

kcoyle commented 2 years ago

Multiple JSON-LD examples are shown ...

Could you provide a link to the examples you are referring to, or at least the document? Thanks.

rob-metalinkage commented 2 years ago

just the DCAT examples at https://github.com/w3c/dxwg/tree/gh-pages/dcat/examples - is there somewhere else that might have other ones?

smrgeoinfo commented 2 years ago

@rob-metalinkage can you point at any example context documents that fulfil similar requirements for a JSON-LD profile?

rob-metalinkage commented 2 years ago

Having discussed with @danbri there is probably not a perfectly understood consensus on the perfect form for a JSON-LD context - there are a few options on expressiveness. So it does need some expert guidance I suspect to get it right.

The basic principles are outlined here: https://www.w3.org/TR/json-ld11/#example-5-referencing-a-json-ld-context

you can see a couple of worked examples here: https://json-ld.org/playground/ (Person, Activity)

There seems to be a lot of cut-and-paste of context details between different applications because vocabularies like DCAT, designed for re-use, do not have canonical contexts available. DCAT v2 as recent (post JSON-LD 1.1) is a good candidate to improve this practice - but the supporting spec and general practices around modularity are strightforward and ready to use.

Note that DCAT publishing a context won't stop people cut-and-pasting (or making up some version) of DCAT context into their own aggregated contexts (with subsequent loss of confidence it hasnt been changed) - but not publishing one forces this anti-pattern in implementations and reduces potential for interoperability.

rob-metalinkage commented 2 years ago

Here is an example of a stand-alone context document published as part of a reusable standard - and directly relevant to the DCAT as a dependency: https://github.com/opengeospatial/ogc-geosparql/blob/master/1.1/contexts/geo-context.json

kcoyle commented 2 years ago

I see "examples" as having different requirements from "real code." In "real code" a URL link to a file of prefixes and namespaces seems obviously efficient. The examples are likely to be looked at by humans, and in that case following a URL to discover the meaning of the namespace prefixes is extra work.

That doesn't mean that DCAT shouldn't provide a sample context file that can be grabbed by folks who can use one. The readme.md in the examples directory could provide a link to that file.

rob-metalinkage commented 2 years ago

+1 to providing the link in the README

Accept that human readability of examples requires careful consideration. Developers are human too and examples tend to be a quick way of seeing how things are done - so taking care not to promote sub-optimal implementation patterns is important.

For a complete file, probably better to have a complete example - including a reusable context link. For "snippet" examples obviously not meant to be runnable, only human readable, perhaps have the snippet in two very simple parts - a context snippet and a code snippet. By the time you refer someone to a complete example I think people will expect implementation patterns.

dr-shorthair commented 2 years ago

It is unfortunate that JSON doesn't allow comments.

makxdekkers commented 2 years ago

It is unfortunate that JSON doesn't allow comments.

Sounds like a glaring omission to me. JSON files will be very hard to maintain over time.

simsong commented 2 years ago

It is unfortunate that JSON doesn't allow comments.

Sounds like a glaring omission to me. JSON files will be very hard to maintain over time.

JSON is not designed to be a language for configuration files or long-term documents maintained by humans. However you could add a 'comment:' key anywhere you want, because most JSON consumers ignore keys that they are not looking for.

danbri commented 2 years ago

I believe there also patterns for doing this in JSON-LD where the comments don't get mapped into rdf triples - so the rdf view doesn't get bloated

On Fri, 26 Nov 2021 at 14:22, Simson L. Garfinkel @.***> wrote:

It is unfortunate that JSON doesn't allow comments.

Sounds like a glaring omission to me. JSON files will be very hard to maintain over time.

JSON is not designed to be a language for configuration files or long-term documents maintained by humans. However you could add a 'comment:' key anywhere you want, because most JSON consumers ignore keys that they are not looking for.

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/w3c/dxwg/issues/1428#issuecomment-980011113, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABJSGL45EAY77MMUXX42I3UN6JY5ANCNFSM5ILISXEA .

dr-shorthair commented 2 years ago

JSON is not designed [... for ...] long-term documents maintained by humans.

Understood. The specific case here is example documents or fragments in a spec - these could be made much more effective if they could be fully commented.

rob-metalinkage commented 2 years ago

I think commenting examples is a separate concern to providing good examples in the first place - and examples that cut and paste contexts are not good examples: they may possibly modify or only partially implement the spec - its impossible to know without fully parsing the examples, the spec vocabulary and performing a comparison on possibly differently serialised but isomorphic graphs with different but equivalent blank nodes etc...

i.e. please just use a canonical context so its unambiguous what the example implements.

danbri commented 2 years ago

On Sun, 28 Nov 2021 at 05:21, Rob Atkinson @.***> wrote:

I think commenting examples is a separate concern to providing good examples in the first place - and examples that cut and paste contexts are not good examples:

That feels inappropriately absolutist - you’re elevating person opinion to a rule for others who may have different tradeoffs to make.

If you are going to be digitally signing some json-ld for example, that gives you good readon not to write blank cheques by making the rdf triples view of the content depend on external changeable (and often even man-in-the-middle-able) definitions.

Similarly - eg IoT - if deployment target environment is potentially unreliably connected to public internet, or you don’t want to broadcast an association between requesting IP address and remote context host. Consider a futuristic health iot home device fetching http://cancertests.medical.example.org/context.json

they may possibly modify or only partially implement the spec - its impossible to know without fully parsing the examples, the spec vocabulary and performing a comparison on possibly differently serialised but isomorphic graphs with different but equivalent blank nodes etc...

i.e. please just use a canonical context so its unambiguous what the example implements.

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/w3c/dxwg/issues/1428#issuecomment-980842310, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABJSGJ7H5T3TMENMBDCIN3UOG33ZANCNFSM5ILISXEA .

dr-shorthair commented 2 years ago

A common use for a comment is to add a label or annotation to explain a URI-reference. Even in a 'canonical' documents.

rob-metalinkage commented 2 years ago

@danbri - you make a good point that there may be different underlying architectural assumptions about implementation constraints. I think the appropriate non-absolutist approach however is to address this directly and provide examples for both implementation patterns - not assume any given pattern is a personal whim.

Admittedly this is tricky in that there is no universally accepted reference architecture - I will argue that objectively the practices I suggested are well established for some relevant architectural styles.

I would also note that they do not preclude the usage you consider - systems determine what their behaviour should be re following links - and if you dont or cant then you simple pre-load resources you need (varying from harcoding to caching when networks are available). Named, modular contexts actually support that by making it obvious how to go safely offline and dark with minimal pre-loading. Ad-hoc cut and paste style means you need to load a lot more if you have multiple schemas - but may be more efficient if your system only needs to know about one message - which is probably common enough we should call out examples. Just not sure its a key use case for DCAT however?

bertvannuffelen commented 2 years ago

It is unfortunate that JSON doesn't allow comments.

Sounds like a glaring omission to me. JSON files will be very hard to maintain over time.

JSON is not designed to be a language for configuration files or long-term documents maintained by humans. However you could add a 'comment:' key anywhere you want, because most JSON consumers ignore keys that they are not looking for.

I agree, for me json-ld is an aid for a developer, bridging the data semantics world and the software developer world. Actually if the URIs used are all dereferenceable then they will point the user to the intended semantics. Furthermore, a json-ld context is bound to a data exchange, and not to a W3C spec, like DCAT. I can write perfectly a json in german and map it to DCAT, resulting is a fully conform, semantically unambiguous DCAT based exchange.

In my experience json-ld contexts need only explanation when the distance between the json representation and the semantic representation is too far. In those case comments are helpful to explain why this or that mapping has been used. But that is at implementation level. Here at the w3c DCAT specification level the json-ld context should be trivial, not over-engineered. But for me, only one quality level should be guaranteed: namely 100% backed-up with firm URIs. (I have seen communities that just do json-ld like XML: using none resolvable domains, defeating thus the purpose for using json-ld for me.)
If complex json-ld mappings are required, then this is maybe a use-case for that community rather than for this.

I think we should as DCAT specification not try to design any representation support to be used directly in an operational system. This is better done by contributing to the systems like CKAN or your local portals source code. When users like to test their output they can be pointed to SHACL validators such as: https://www.itb.ec.europa.eu/shacl/any/upload. It should be that the json-ld output of a system should be verifiable w.r.t. the SHACL associated with the spec. (Although I expect that one only find technical RDF errors.)

So back to the original request: using a reference to a published context versus to a expanded version. That is a matter of the objective of the example. Using a published context allows to condense the example and have the reader focus on the more important things, however it will loose the intuitive reading that the word "dataset" in the example is actually the uri dcat:Dataset. (On purpose I did a mapping on the class). In this case the reader must first open the referenced context in order to find the examples interpretation. Alternatively we are using a qNamed approach like the turtle representation, then the referenced context is probably equivalent with the prefix definitions. But the resulting json will be violating any naming convention practice in the json community.

The current turtle examples are simple: there is no engineering or naming convention issue because they use the URIs. I propose we should not introduce these challenges into the DCAT specification.

rob-metalinkage commented 2 years ago

@bertvannuffelen agree with all the logic until I come to the example function - the examples should exemplify potential real world usage - which should cover the two use cases identified: 1) canonical context published once, identifiable buy its reference and cacheable, and 2) "I must be self contained at all costs".

Currently examples promote ad-hoc context descriptions, which may have varying degrees of "over-engineering", and an inability to identify whether an json data object actually conforms to DCAT - who knows what deliberate or accidental changes are introduced in local implementations of DCAT context..?

riccardoAlbertoni commented 1 year ago

Marked as future work, as we might want to reconsider under a new perspective in a next round of standardization.