json-ld / json-ld.org

JSON for Linked Data's documentation and playground site
https://json-ld.org/
Other
859 stars 152 forks source link

Semantic Free JSON-LD #443

Closed RickMoynihan closed 6 years ago

RickMoynihan commented 7 years ago

Hi all,

I've recently been trying to design a JSON format to act as a bridge between the RDF world and the typical JSON developer. However due to complexities in the underlying RDF ontology, it is proving almost impossible with the current JSON-LD features to create both an easy to use JSON format and an isomorphic representation of the semantic graph from which the data is derived.

My primary goal is basically to create a useful JSON format and I don't really care about the semantics expressed via the RDF interpreatation of the JSON at all. My feeling is that if our users want the RDF interpretation they can Accept: text/turtle or another appropriate serialisation.

I really want to leverage only the JSON side semantics of JSON-LD as a means to give JSON users access to linked data URI's (which are syntactically @identified as such, @language tags, RDF data-types, e.g. xsd:dateTime's and prefix URI's. Basically I want to use JSON-LD as a means to augment RDF with JSON, and perhaps even give JSON developers hooks into documentation on certain keys, e.g. answering the question what does the dimension key mean, by letting them dereference it?

For me this is perhaps the most exciting application of JSON-LD, and I think it'd be nice if there was a way to formally tell processors and users, to not process it as RDF.

Basically I can get the syntax I want at the expense of RDF semantics, and I'd like to inform users "not to interpret this as RDF".

I realise I can achieve this by taking advantage of the fact that processors don't pursue keys that they don't know about in the context. However it might be nice to flag this to users, incase they think it's going to yield some meaningful RDF.

I'm sure you've thought about cases like these, so I'm curious what you think the best way to tackle it is.

Also please pass on my thanks to everyone in the Working Group for coming up with what looks to be a great new standard.

gkellogg commented 7 years ago

Rick, do you have a specific proposal? It may be feasible to add some syntactic bits to JSON-LD that are not directly reflected in the RDF transformation (for example, "@container": "@index" does this). However, as stated in the JSON-LD spec, the data model for JSON-LD is RDF, and going too far away from this can upset the theoretical underpinnings of the format.

The best way to do this is to create individual issues with your spelled-out feature requests so they can be discussed on their merits.

RickMoynihan commented 7 years ago

@gkellogg Firstly thanks for such a prompt response :-)

Secondly - depending on what you think; you might consider this a to be:

It really depends on what you think...

as stated in the JSON-LD spec, the data model for JSON-LD /is/ RDF,

Yes, but this is only true if you choose to read it as RDF, there is also another data-model and reading which JSON-LD is also targeting, which is the JSON one where the data has bespoke application semantics too. e.g. "this object is a lookup table from X to Y etc.."

Let me provide some context to the discussion. We publish large amounts of Linked Data already, so we already have the RDF angle covered by offering downloads in almost every RDF serialisation format already, so for us the ability to interpret JSON as RDF is nice but unnecessary. Incidentally I fully support JSON-LDs other goal of letting arbitrary JSON API's acquire RDF semantics (very cool - and something we might find useful one day).

As I alluded to earlier, I think JSON-LD can serve an important role of allowing non-RDF applications to use features of RDF a la carte, rather than adopting RDF wholesale. For example the use of dereferencable URI's as identifiers, language tags etc - "linked data" with a small L and D, if you like, maybe JSON-ld :-).

In this world view, I think it would be useful to use JSON-LD as a means to sprinkle occasional links to Linked Data resources, without necessarily implying a semantic interpretation of the JSON. Though the resources at the other end may have an RDF meaning, you're providing them not for machine-level KR, but because there's utility in the URI's as identifiers or perhaps also documentation or services.

For me (as a large existing user of RDF) this is the main use-case for me to adopt JSON-LD, I'd consider it a pragmatic view of inter-operating at the appropriate level for your application/audience rather than a continuation of SemWebs historical "one true data model" world view. For example even when developing front end Linked Data javascript applications, we often don't want to rely on RDF/SPARQL as an interchange format; one thing JSON is quite good at is at providing applications with serialised data-structure's that are easily hydrated into domain objects with minimal fuss.

Historically we've written quite a few service backends which have had to adopt their own conventions for representing bits of RDF; in particular URI's as distinct from strings, but also the rest of the RDF primitives. What we typically don't need to transmit to the front end of the app are RDF level KR semantics - usually knowledge of the bespoke format and the RDF data-types is enough.

In one sense what I'm proposing here is just a different use-case; that doesn't really require any changes to JSON-LD. As all I'm talking about is using JSON-LD for its pure JSON interpretation rather than it's RDF one... and interpretation is always entirely up to the interpreter. Also it's an open standard, and you can't really stop me from (ab)using it! ;-)

However, I do also care about correctness, so I'd prefer that as there is an RDF interpretation of my data that it maps perfectly to a subset of the real triples. For example I don't want to introduce blank nodes that shouldn't really be there. I realise I could definitely use JSON-LD to convey the true RDF semantics too, but only at the expense of the JSON interpretation; which is for me an unacceptable compromise.

My current approach has basically been to mindful that the RDF interpretation exists and to therefore ensure we return a valid subset of the real triples, but not a particularly useful interpretation. As a concrete example you can see this document here:

https://github.com/OpenGovIntelligence/json-qb/blob/master/spec/table-format/table-format-1.json

along with some background notes (work in progress) on the design of this format:

https://github.com/OpenGovIntelligence/json-qb/blob/master/spec/table-format.md#core-table

If you paste that JSON-LD into the playground, you'll see it yields just a few triples which correspond to some of the DCAT metadata for a qb:Dataset. This therefore seems to be technically correct (as would returning an empty set), but I worry that some users might expect JSON-LD to also have a useful RDF interpretation, and that they might send me bug reports saying things like "data is missing" etc... In particular above, the structure which loosely corresponds to a qb:structure is purposefully not defined in the context, even though that subtree contains JSON-LD terms.

This is where I think I'd benefit from some suggestions or thoughts on whether you consider this kind of usage a valid use-case, and secondly if you do whether it might be worth defining a special flag in the context, e.g. something like "@interpretation" : "json-only" where possible values might be something like: json+rdf, or json-only.

Anyway, thanks again for your help and work on JSON-LD, and apologies if this isn't the right venue for discussing this topic.

akuckartz commented 7 years ago

@RickMoynihan Your last comment only consists of a '`'.

gkellogg commented 7 years ago

@RickMoynihan Note that if you expand https://github.com/OpenGovIntelligence/json-qb/blob/master/spec/table-format/table-format-1.json, you get a short document:

[
  {
    "@id": "http://statistics.gov.scot/data/births",
    "http://purl.org/dc/terms/issued": [
      {
        "@type": "http://www.w3.org/2001/XMLSchema#dateTime",
        "@value": "2014-07-29T02:00:00+02:00"
      }
    ],
    "http://purl.org/dc/terms/modified": [
      {
        "@type": "http://www.w3.org/2001/XMLSchema#dateTime",
        "@value": "2016-02-09T12:17:48Z"
      }
    ],
    "http://purl.org/dc/terms/title": [
      {
        "@value": "Births"
      }
    ]
  }
]

This indicates that much of your document is being ignored by the JSON-LD processor; likely because "table" is not defined. If you add an @vocab definition. If you add, for example, "@vocab": "http://example.org/", you get the whole document expanded, along with quite a number of RDF Triples.

It is certainly intended that people work with JSON-LD without needing to be aware of the underlying RDF nature of the data model, but we need to be careful when adding features that they remain semantically valid (whatever that may mean).

Another valid use-case is to have a JSON document, which is only partially interpreted as JSON-LD. That's what you have now, and it is certainly a valid document, but if you can't do a reasonable round-trip through expansion/compaction, it's not part of the data model.

There is an issue to allow for a JSON datatype, so that the value of a property can be recorded as a JSON value, and not interpreted as JSON-LD, this might be what you were thinking about with your "@interpretation" : "json-only" suggestion. (See #333).

Personally, If you want to consider a document JSON-LD, IMHO, you should be able to round-trip at least through expansion/compaction and be able to flatten/frame. If you can do this, you can also probably round-trip through RDF. Not everything needs to be considered, but the core semantic meaning of your document should be representable in the data model.

This is certainly a good venue, but for general discussions, not specific feature requests, you might get more feedback from an email to public-linked-json@w3.org.

RickMoynihan commented 7 years ago

Hi @gkellogg again thanks for the detailed reply.

This indicates that much of your document is being ignored by the JSON-LD processor; likely because "table" is not defined. If you add an @vocab definition. If you add, for example, "@vocab": "http://example.org/", you get the whole document expanded, along with quite a number of RDF Triples.

Yes, I'm purposefully trying to hide the bulk of the document and the structure part from the RDF pre-processor. If I provide @context definitions for structure and that part of the tree and try to satisfy both JSON users whilst retaining the true RDF mapping, the JSON becomes too unwieldy for our use case. Despite not being processed by the RDF extraction, I still deliberately make use of JSON-LD features in this subtree, but I worry that JSON-LD users won't understand the intention here.

I also 100% appreciate your point about being careful when adding features not to break RDF semantics, I certainly wouldn't want that to happen! Ironically it's caring about the RDF semantics that's making me want to explicitly say "don't consider the RDF semantics in this document".

That's what you have now, and it is certainly a valid document, but if you can't do a reasonable round-trip through expansion/compaction, it's not part of the data model.

By "data model" I'm assuming you mean the portion of the JSON that can losslessly round trip, i.e. the materialisable RDF statements.

There is an issue to allow for a JSON datatype, so that the value of a property can be recorded as a JSON value, and not interpreted as JSON-LD, this might be what you were thinking about with your "@interpretation" : "json-only" suggestion. (See #333).

This appears to be asking for a little more than my "@interpretation": "@json" suggestion as it seems to want to survive the round-tripping. For my use case what's discussed in #333 would probably work - though I'd want to annotate the outermost JSON object, but I don't really require it to round-trip as I don't intend to extract an RDF data-model from the document at all.

gkellogg commented 7 years ago

By "data model" I'm assuming you mean the portion of the JSON that can losslessly round trip, i.e. the materialisable RDF statements.

I mean round-tripable through the JSON-LD transformation algorithms.

You are, of course, free to create a JSON document in which only elements of it apply to JSON-LD. JSON does have it's own data model, it's just that JSON-LD is consistent with the RDF data model.

It was always intended that JSON-LD be able to include things which are not interpreted directly by the algorithms, so you're clear there. Other than that, it becomes a matter of philosophy.

IMO, if you want to call it JSON-LD, it needs to be handled by the core API algorithms. In this way, it's similar to the relationship between HTML and RDFa, some of it may be interpreted as RDFa, and others are simply ignored. RDFa allows HTML literals, JSON-LD should probably allow JSON literals, thus #333. Both can include (or be contained in) other markup that is not intended to be part of the model.

akuckartz commented 7 years ago

What is the use case and the target audience for such incomplete RDF representations ?

RickMoynihan commented 7 years ago

@akuckartz: In one sense there is no use-case for the incomplete RDF representations. My use case is almost entirely born out of the fact that we have RDF already (so if you want RDF - you can just ask for turtle or n-triples or any of the other formats we support). My use case for JSON-LD is therefore born out of a need to create nice JSON APIs that leverages JSON-LD syntax to provide information on RDF primitives, e.g. URI's, data-types, lang strings.

Basically I don't want applications to interpret my JSON-LD format as RDF at all. I expect them to interpret it only as JSON. Our backend database represents data as RDF data cubes, and we're basically looking at JSON-LD as a way to communicate RDF types to front end and API clients. These clients aren't interested in RDF triples/semantics; but are interested in knowing that things are URIs/identifiers, prefixes or language strings.

Hence if we didn't use JSON-LD we'd have to recreate 90% of JSON-LDs syntax, and then document it. Better for us to just build our format on-top of the great work you've done.

Could we make the JSON-LD also generate the triples we have stored? Certainly, but only at the expense of being nice JSON. Trying to square this circle results in lots of extra levels of JSON nesting just to please the RDF semantics, which our application and users don't care about.

So there are basically a few options available to me:

  1. We use JSON-LD to get the nice JSON syntax we want and silently ignore the fact if someone were to choose to interpret it as RDF that they'd end up with different triples to what we actually store. I feel this is undesirable, because it's not only surprising but the data you end up with is wrong.

  2. We wire up the JSON-LD in such a way that we emit either no triples or an impoverished but correct subset of the RDF. e.g. as we are now we emit just a few triples of dcat metadata.

  3. We make the format emit the true triples, and make ourselves and our JSON users jump through hoops. I strongly want to avoid this option. We want to provide a JSON representation in the first place to make things easier. Also it's non-trivial to design and maintain these formats whilst also ensuring an artificial semantic correctness property on ourselves that we don't actually rely on, or expect our target users to use.

  4. JSON-LD 1.1 acquire's additional features such as @reverseindex that make it less cumbersome to emit both correct RDF and nice JSON. I think features like this would be great for other use-cases, where you want both full RDF & JSON semantics together.

Option 2 feels like the best option available to me; but speaking to you both, it appears there is an expectation that JSON-LD should always emit RDF. Where as I think JSON-LD is incredibly useful even if it doesn't; and I think if it doesn't it might be nice to indicate that it's expected not to produce any triples.

I hope this makes sense.

dlongley commented 7 years ago

I think I understand the use case. I don't think there's anything wrong with a JSON-LD document that actually produces/contains no triples. It seems like an OK way to proceed at this time given the constraints. I'm +1 to adding a feature to JSON-LD 1.1 to help address this use case (and those like it) more robustly in the future.

One reason for creating JSON-LD is so that people can use it just like regular JSON without having to use/know/think about the -LD bits. Adding more features to help facilitate this as new versions of JSON-LD come out has always been a goal (if not the primary one), IMO.

gkellogg commented 7 years ago

@RickMoynihan If you look at my previous reply, my concern was not really about getting triples out of all JSON-LD, but about ensuring that it works as expected through expansion/compaction. Even so, some data may be lost, which is fine, but then these aren't really part of the JSON-LD model.

Still, I await a specific feature request. I don't really see how the @interpretation keyword might work.

If it's JSON-LD, it's composed of objects which are related via properties. In some cases, those properties may not result in any triples, but they should be managed, somehow, through expansion, and ideally, survive re-compaction. Anytime you have an object, you have a subject identifier. If a property expands to a IRI, you have a predicate. The issue remains that if the value can be interpreted as a Resource or Literal, you get an RDF object. If it can't (e.g., @index), then it is ignored. Any new feature for non-RDF JSON-LD needs to consider where it plays in this system.

lanthaler commented 7 years ago

I read the whole thread twice now but I', still not sure I fully understand the use case.

Basically I don't want applications to interpret my JSON-LD format as RDF at all. I expect them to interpret it only as JSON. [...] These clients aren't interested in RDF triples/semantics; but are interested in knowing that things are URIs/identifiers, prefixes or language strings. Hence if we didn't use JSON-LD we'd have to recreate 90% of JSON-LDs syntax, and then document it

What are these clients? Are they clients specifically created for your API? If so, you can do whatever you want. I'd suggest to simply use JSON-LD without specifying a context at all. The document would be meaningless to a JSON-LD processor and wouldn't produce any triples but you can build your API on top of JSON-LD's syntax.

If you intend to use conformant JSON-LD processors in those clients you need to respect JSON-LD's underlying data model and it's processing algorithms as Gregg outlined. There's no way around it.

So there are basically a few options available to me:

  1. We use JSON-LD to get the nice JSON syntax we want and silently ignore the fact if someone were to choose to interpret it as RDF that they'd end up with different triples to what we actually store. I feel this is undesirable, because it's not only surprising but the data you end up with is wrong.

+1, that would likely result in issues down the road

  1. We wire up the JSON-LD in such a way that we emit either no triples or an impoverished but correct subset of the RDF. e.g. as we are now we emit just a few triples of dcat metadata.

What's the problem with that? That's a completely valid document and doesn't require any changes to the JSON-LD spec.

Btw. I would love to have JSON literals as they would be a nice way to keep such JSON blobs around. They would be simply passed through the algorithms without being further processed.

RickMoynihan commented 7 years ago

What are these clients? Are they clients specifically created for your API? If so, you can do whatever you want.

Yes they're bespoke clients, not JSON-LD interpreters.

  1. We wire up the JSON-LD in such a way that we emit either no triples or an impoverished but correct subset of the RDF. e.g. as we are now we emit just a few triples of dcat metadata. What's the problem with that? That's a completely valid document and doesn't require any changes to the JSON-LD spec.

Agreed, and I've tried to say so much. There are two issues though,

  1. The lesser issue is user expectations. If I serve a document as JSON-LD, users might expect it to represent more than an empty graph. Communicating this to them in a machine readable manner might therefore be a nice (though admittedly non-essential) extra.

  2. I'd like to use the @context for specifying prefixes... but defining a @context also implicitly means process this object as RDF. Basically I'd like to specify a @context to communicate to JSON api users that they can expand prefixes throughout, but avoid implying that RDF interpreters should convert the document into triples/quads.

gkellogg commented 7 years ago

This is not dis-similar from having an HTML version of a vocabulary document, which contains RDFa, which is automatically added by ReSpec, but does not necessarily include the triples for the vocabulary itself. If it does, then it's appropriate for an RDFa version to specify a superset of the triples contained in, say, the Turtle mapping. It might be nice to be able to say that the RDFa triples do not represent those from another representation.

Certainly, one way to do this would be to put the triples in a different named graph, so that the default graph did not contain any of the triples from other representations. For example:

{
  "@context": {"rdfs": "http://www.w3.org/2000/01/rdf-schema#"},
  "rdfs:label": "this causes all content within @graph to be placed in an anonymous named graph",
  "@graph": {
    "@id": "Document",
    "rdfs:label": "This will create a triple in an anonymous named graph in the form `_:anon {[rdfs:label \"...\"]}`"
  }
}

If you make sure that the triple added to the default graph is consistent with your other representations, it would seem to satisfy your needs.

gkellogg commented 7 years ago

See https://github.com/shexSpec/shexspec.github.io/pull/1 and https://github.com/shexSpec/spec/pull/5.

lanthaler commented 7 years ago

I'd like to use the @context for specifying prefixes... but defining a @context also implicitly means process this object as RDF. Basically I'd like to specify a @context to communicate to JSON api users that they can expand prefixes throughout, but avoid implying that RDF interpreters should convert the document into triples/quads.

I'm not sure I follow. Why would an API client expand prefixes if it is only interested in the JSON and not in RDF triples? If the RDF vocabulary you are using is causing those issues, why don't you create a simpler, better suited vocabulary?

RickMoynihan commented 7 years ago

Why would an API client expand prefixes if it is only interested in the JSON and not in RDF triples?

@lanthaler because URI's are identifiers.

If the RDF vocabulary you are using is causing those issues, why don't you create a simpler, better suited vocabulary?

Because the vocabulary we are using is a W3C standard, and it's well suited to its task. Creating a new vocabulary just to get a human friendly JSON-LD syntax feels a bit like putting the cart before the horse.

RickMoynihan commented 7 years ago

@gkellogg I'm not sure what you're suggesting regarding putting the triples in a different graph. Graph's have no real "semantic meaning" in RDF in that they don't make any claims about the world. Putting the JSON-LD triples in a random graph might help users separate them and special case them; but it still feels like a hack.

gkellogg commented 6 years ago

This is covered via #333.