Post LC-2 Editorial Comments by Robin Berjon

Hey @darobin, sorry for the delay, got hit by an avalanche of work. I'm digging my way out through the JSON-LD stuff and wanted to respond to you in detail to make sure I'm not missing anything before we take JSON-LD to Candiate Rec. I'm treating your blog post as a comments on the spec and will try to address the things I think we can address.

thanks for taking the time to respond in detail. I don't think we talked *entirely* past one another :)

Maybe not entirely, but I think you're focusing on the RDF aspect of JSON-LD too much. It's plumbing. It's behind the walls for the most part. I know there's an argument there for a few pipes that stick out at ugly places, but that's standards-making for you. The hope is that there aren't so many rough corners as to prevent JSON-LD from being useful to lots of people.

If I read through the spec, the intro is fine, as are the design goals and the terminology. Using my overwhelming willpower not to frown at mentions of IRIs — let alone compact IRIs — it seems to promise pretty much what I'd want.

We tried to change it to URL and were summarily denied by the more pedantic folks in the group. It would help if someone would get the URL spec to REC status. I nominate AnneVK, I hear he's bored to tears these days. :P

The second example is weird — what's with all those URLs!?! — but then the text does say it's overly verbose and difficult to work with. It makes me wonder why it's even possible, but hey, the spec says it's bad anyway.

When you go from compact mode to expanded mode back to compact mode in JSON-LD, folks wanted to be able to store documents in expanded mode (fully dereferenced terms, etc.) to make it easier to process those documents internally. That's why it's there... I know of no other way of conveying what's actually going on other than using that example. Got a suggestion of what else we could do there?

The third example, however, kills me. "@type": "@id"? I can't look at any of this and guess what it's doing. The text tells me it's used to map terms to IRIs. Why should I care? Isn't that stuff just for RDF people? I want data with, you know, links.

Fair point, the issue is being tracked here:

https://github.com/json-ld/json-ld.org/issues/274

I'll see what I can do.

As I read on, it gets worse. I have the likes of "foo": 42 in my data, but apparently this can now also appear as "foo": { "@type": "http://www.w3.org/2001/XMLSche...", "@value": 42 }. If I write a processor for a data source, I need obj.foo to always return 42. If all of a sudden it becomes legit for the emitter to return some weird object instead, or if I start needing a special extra toolkit, then I'm hosed. That would violate the promise.

If the data source starts publishing data like that, you probably won't use that data source. Just because you can do all sorts of neat layout tricks using TABLE in HTML doesn't mean you should. It's considered a bad practice. Doing what you mention above would also be considered a bad practice.

However, even if you come across that bad practice in JSON-LD, you could just use the API to compact the document back down to { "foo": 42 }... that's what compaction does, it allows your application to format the incoming data in a way that is regular and easy for it to use.

And that's probably the heart of the problem. Linked Data shouldn't be tied to the RDF data model.

It's not. At least, I don't think it is. There are others that do. I think they're wrong.

We actually started out with a simpler data model than RDF (a tree-based one), but quickly converged on something that looked so close to RDF that we might as well merge with the RDF data model.

If we're not going to use the RDF data model, then what data model should we use? JSON doesn't go far enough, so what's the replacement?

Data wants to be dumb. Removing ambiguity is a job for interpretation, not for data modelling or encoding. The data model involved under JSON-LD is, in fact, a layer violation. It complicates the data by trying to embed its interpretation.

I don't understand this line of reasoning. If removing ambiguity is a job for interpretation, then isn't it a good idea to remove as much ambiguity from the serialization as possible? Wouldn't the data be better as a result? There is a reason that JSON has numbers. It could have just been all strings, but at some point someone thought that making the distinction would ease developers lives. The same applies to JSON-LD. When you're dealing with data that must be interpreted in the same way across the Web to be useful, it's helpful to make sure that there is less room for mis-interpretation.

Maybe that's not what you're getting at, but the whole "layer violation" comment above is too vague for me to really understand your point.

When I see people much smarter than I (such as most of the RDF community) flock to such things I always wonder. But I really don't see the appeal in conflating data and interpretation. That's just not how I see information working.

I think the bottom line is the data actually does have a specific meaning. It's meant to be interpreted in a particular way. For example: "This string is a URL, you should interpret it as such" is a helpful piece of information when you're attempting to interpret a JSON-LD document. The same goes for "This string is a date formatted in ISO-8601 format." - if you know that, you know what piece of logic to apply when interpreting the date. JSON doesn't have this and you often have to rely on out-of-band mechanisms to achieve the same sort of interoperability.

I like the idea behind @context (so long as it's constrained to appearing only at the root of a JSON document). I reckon @base can also be useful in case you have a document that's detached from its source. But all the rest is just RDF overhead. I don't mind people using RDF — but for data to remain useful, it should only risk having RDF artefacts for RDF users.

RDF artifacts like what? I have a feeling that you're reading bits of the JSON-LD spec, finding things you don't like, and saying you don't like them. In most cases, those things you don't like will rarely be seen in the wild. In other cases, those things you don't like are necessary evils on the path to supporting some important use cases. You'll have to be more specific.

What I'd go with instead of this would be an @schema key, restricted to the root, that could point to the default interpretation for this document. The schema it points to needs to be a regular tree-based schema that can match in the tree at any depth (and of course import other schemata if it needs to). Since the schema's interpretation walks the tree, it's entirely possible to decorate any branch with the information required to produce an RDF interpretation (and of course a third-party can also provide that).

We tried that in the early days of JSON-LD and it was a miserable failure. The problem has to do with arbitrarily nested data (person A knows person B knows person C). In this case, your "@schema" data would change for data that only changes in the way it is nested. That is, your rules for interpreting the data are tightly coupled to the structure of the data that you're publishing and that's a very bad thing.

We've seen before that RDF doesn't play well with tree data formats. What made RDF/XML madness to process was precisely that it was impossible to process the data using XML tools. It made the XML part useless: you just had to use an RDF toolchain or be screwed (and that's not really an option).

It's true that RDF/XML was horrible for a variety of reasons, one of them being the problem with structuring data. It's one of the reasons that JSON-LD has the Framing feature (to re-layout the data in a format that is native to your application).

I expect that many people that publish JSON-LD will do so in a strict format so that they don't incur this "graph" tax on their developers. In fact, the ones that don't stick to this will probably end up with APIs and data that are not very popular.

It's certainly possible for a publisher to promise to its consumers that it will only use the sane parts of JSON-LD and that the content is therefore guaranteed to always be processable using just JSON; in the same way that a number of RDF/XML publishers committed to producing data that always had the same shape so that people could actually use it. (And it's not true that this is just for "Web Developers". Processing RDF is only perceived as usable by RDF people.) But at that point, why use JSON-LD and not just JSON?

Because JSON-LD gives you the following features that JSON doesn't:

a universal identifier mechanism for JSON objects via the use of URLs,
a way to disambiguate keys shared among different JSON documents by mapping them to URLs via a context,
a standard mechanism in which a value in a JSON object may refer to a JSON object on a different site on the Web,
a way to associate datatypes with values such as dates and times,
the ability to annotate strings with their language,
a facility to express one or more directed graphs, such as a social network, in a single document.
a standard way to map external application data to your application data domain
a deterministic way to generate a hash on JSON data
a standard way to digitally sign JSON data
a deterministic way to merge JSON data from multiple data sources

Why introduce the variability in syntax? I think that the API spec alone makes it clear just how unpleasant the compaction/expansion steps can be.

... because it's useful. :)

Expansion and compaction can be complex (for implementers). JSON-LD graph normalization is horribly complex (for implementers)... but developers don't care about that complexity because it's hidden behind an API call. The same goes for security when you connect to an HTTPS website - fantastically complex process, hidden behind API calls that make it accessible.

Compaction and expansion is helpful when you want to take data from another application domain and apply it to your application domain. Compact form exists because folks want to access their data by doing something like 'data.workAddress.street'. Expanded form exists because it's a useful structure if you just want to process data in it's raw state (and there are a number of useful use cases where this is desired, like data snapshotting).

We have six implementers now that have gotten the implementations right, so it's not so complex that folks don't know how to implement it. Also, developers don't care about this complexity, they just call jsonld.compact(data, myContext) and then work with the data in their application.

It's true that authors still have to understand that there is a compact form and an expanded form, but the alternative would be to not have the features (or hide them). We had expanded form hidden before and got a lot of comments in to unhide the features because they were useful to developers.

Just ditch the RDF then, and come have fun at the bad kids' table :)

I'd love to ditch the RDF if it meant that we could stop having discussions about how RDF is too complex. So, let's ditch RDF... what's the data model that we're going to use instead? Keep in mind that it has to address at least 50% of the major use cases that we have for JSON-LD to be viable.

I agree, but this begs the question: why use the RDF data model at all? Don't you think that its most useful properties could be better layered atop dumb data in a much more orthogonal way, that would also not require all that trouble with the triples?

In JSON-LD, you don't have to deal with triples unless you want to (technically, JSON-LD uses Quads). We went to great lengths to not require the use of triples/quads... you just use the JSON data structure to access and store your information.

We tried to ditch the RDF data model, we ended up re-inventing the RDF data model. If there is a better way to layer it on top of dumb data and address all of the use cases we have, I don't know what that technology looks like. I really want that technology to exist, but somebody smarter than me (and the rest of the folks working on JSON-LD) will have to create it.

I wouldn't really put it that way. It assumes that conversion *between RDF (or RDF-like) vocabularies* is common. That's a restrictive subset of the usage scenario.

What is the "usage scenario"? I bet our usage scenarios are different. :)

If your data layout changes in ways that aren't backwards compatible, then you're producing data that's using a new type, quite likely breaking whatever contract you had with your consumers. With a few exceptions, it ought to be perfectly fine to describe JSON data by pointing into it *because* the assumption is that that's *exactly* how people will be processing it. If you break the addressability, you're breaking the processing (at which point it doesn't matter if it's open world or not — it's just broken data). The alternative is to use a weird data model that is not the one that's natural to the data format. Again, that's the zombie return of RDF/XML.

Too abstract, I don't understand how the point you're making here could be applied to JSON-LD other than "only allow one way to layout data". Unfortunately, that assumes that the data you're getting in is only coming from one source (JSON). JSON-LD doesn't assume that. At a minimum, it assumes that you might have data coming in from RDFa, and other data coming in from JSON-LD. In these cases, you can't enforce a particular layout. You just have a blob of data and you need to be able to deal if that blob of data is structured differently, but contains the same information.

To make this more concrete. The Web Payments work reads items from sale (assets) from web pages. The data is expressed in RDFa (because most authors have the access rights to publish RDFa, even if they don't have access to the web server). We take that data and convert it to JSON-LD. In other cases, the person retrieving the data may have the same information already in JSON-LD. Once we convert both to JSON-LD, we need to compare to see if the same statements are being made (in order to do a digital signature on the data). So, the first set of data might say {"a": "1", "b": 2"} and the second set of data might say { "b": 2, "a": 1 }. Same data, stated in a different way.

Now you could argue that you should just be able to sort the keys and everything would be fine, except that a naive approach like that doesn't work when you have data that doesn't have a unique identifier, like an array of objects, or a graph with cycles in it.

The counter-argument, of course is, don't complex data structures like graphs! But if we do that, then we can't model some pretty basic Web-y stuff. The Web is a graph data structure. We need a data model that is capable of expressing what the Web is. Keep in mind that not all applications need this complex data structure, which is why JSON-LD works just fine for tree-based data structures as well. Sometimes your data only has one root, and that's perfectly fine as far as JSON-LD is concerned.

However, sometimes your data isn't so simple... the real world creeps in, and you need a data structure that is more capable. I do admit that many people may not have this requirement, but the Web Payments work does, and that's why JSON-LD was created. Standard JSON and tree-based data models weren't cutting it.

If standard JSON works for you, then use that. Better to keep your system simple than bring in some of the complexity of JSON-LD. However, JSON-LD was designed for applications where standard JSON isn't ideal. If it seems like JSON-LD is too complex for your use case, it probably is, and you should always go with the simpler solution.

I've been considering just adding a "link" type to Web Schema. That way you know it's a link and you don't need the JSON Ref indirection. What's more, you can properly constrain where links can appear in your data, which is a huge bonus. And you can also describe and constrain links usefully.

Yep, also the reason why hyperlinks are natively supported in JSON-LD.

I only work with trees (that may have links between them, but as special vertices).

So, in other words... a graph. :P

I'm not interested in converting to and from graphs, or between them. I am interested in tree-to-tree conversions though (and if there are graph-using people out there, that ought to be enough to produce a tree representation that they can interpret using their graph stuff).

A tree is a special type of graph. You're arguing semantics at this point. That said, JSON-LD handles tree-to-tree conversions via expand/compact and via JSON-LD Framing. Granted, you may want a sub-set of the functionality, but the problem is already more-or-less solved in JSON-LD.

That said, having a competing spec would be a good thing. RDFa benefited by having Microdata around. The same would be true for JSON-LD.

Currently, there's not much here that I can apply to the JSON-LD spec because we're fairly late in the process and what you seem to be asking for is a pretty fundamental set of changes to the spec. Additionally, the fundamental changes to the spec are more-or-less where we started with JSON-LD, so it's not like we hadn't considered most of your proposals.

The real danger here is that we go through the same awful process that RDFa and Microdata went through (and are still going through). The JSON-LD group has tried to cut features very aggressively (I know it may not seem like it to someone just picking up the spec, but there is a long list of features we rejected along the way).

We could create a JSON-LD Lite, but it would basically look like what RDFa Lite looks like now. It wouldn't change processor implementations, it would just change people's perception of the simplicity of the technology. RDFa Lite is more-or-less a marketing document... it doesn't change implementations, it just presents RDFa in a simpler light. We might do the same for JSON-LD if there continues to be the "complexity" perception with the document.

In the end, I tend to look at specs as 10 year experiments. I'm not too concerned with putting something out there that isn't perfect. Technology is a constant stream of iterative design. If we get it wrong with JSON-LD (which I don't think we have), then someone else will hopefully learn from our mistakes and be able to create something better from our deployment experiences.

json-ld / json-ld.org

Post LC-2 Editorial Comments by Robin Berjon #274