w3c / json-ld-syntax

JSON-LD 1.1 Specification
https://w3c.github.io/json-ld-syntax/
Other
111 stars 22 forks source link

Make it easier to keep semantics of object-oriented models #376

Open about-code opened 3 years ago

about-code commented 3 years ago

In object-oriented programmes the meaning of a property name of a class Employee is bound to a class. An Employee_name is not the same as a Company_name (a name property bound to a class Company). In this respect every OOP type is closer to a separate context (and may even have its own vocabulary).

In contrast, JSON-LD by default, uses a purely term-based mapping of properties onto IRIs (or IRIs onto properties) where a term is assumed to mean the same everywhere. To illustrate this, consider the following example:

{
  "@context": {
    "@vocab": "https://data.my.org/vocab/#"
  },
  "@id": "123-123",
  "@type": "Company",
  "name": "Tesla",
  "people": [
    {
      "@id": "123-124",
      "@type": "Employee",
      "name": "Nikola"
    }
  ]
}

the internal view of a JSON-LD parser is

{
  "@id": "123-123",
  "@type": "https://data.my.org/vocab/#Company",
  "https://data.my.org/vocab/#name": "Tesla",
  "https://data.my.org/vocab/#people": {
    "@id": "123-124",
    "@type": "https://data.my.org/vocab/#Employee",
    "https://data.my.org/vocab/#name": "Nikola"
  }
}

where the IRIs of both name properties indicate semantic equivalence. On the one hand this makes it easy, e.g. to assign or declare a certain meaning (IRI) for all occurences of a term by mapping it onto a different URI, like in this example:

 "@context": {
    "@vocab": "https://data.my.org/vocab/#",
    "name": "https://schema.org/name"
  },

But as an object-oriented developer I often do not want to assign meaning like this, but rather keep meaning when mapping JSON structures onto a graph (e.g. in order to load it into a triple store). However, keeping the semantics of an object-oriented model encoded in JSON requires that properties are mapped onto different IRIs, depending on the (OOP-) type they belong to. For example, I am looking for something like this:

{
  "@id": "123-123",
  "@type": "https://data.my.org/vocab/company/Company",
  "https://data.my.org/vocab/company/name": "Tesla",
  "https://data.my.org/vocab/company/people": {
    "@id": "123-124",
    "@type": "https://data.my.org/vocab/employee/Employee",
    "https://data.my.org/vocab/employee/name": "Nikola"
  }
}

Indeed this is possible with JSON-LD. An idiom I use for this is (see JSON-LD Playground)

{
  "@context": {
    "comp": "https://data.my.org/vocab/company/",
    "empl": "https://data.my.org/vocab/employee/",
    "Company": {
      "@id": "comp:Company",
      "@context": {
        "@vocab": "comp",
        "@propagate": true
      }
    },
    "Employee": {
      "@id": "empl:Employee",
      "@context": {
        "@vocab": "empl",
        "@propagate": true
      }
    }
  },
  "@id": "123-123",
  "@type": "Company",
  "name": "Tesla",
  "people": [
    {
      "@id": "123-124",
      "@type": "Employee",
      "name": "Nikola"
    }
  ]
}
  1. for every @type declare an IRI prefix "prefix": IRI (resp. "construct an IRI for the type-specific vocabulary")
  2. create an expanded term definition for the type name
  3. optional: use "@id": "prefix:Term" to make the typename a term of the type-specific vocabulary rather than the default vocabulary
  4. declare a "type-scoped context"
  5. use the type-specific prefix IRI as the default vocabulary within the type-scoped context
  6. optional: propagate the type scope to any embedded object without its own type declaration

But, well, that doesn't look compelling and soon becomes verbose if a data model is a bit more involved. There should be some option and algorithm to automatically create a type-scoped context for each type and apply type-dependent IRI expansion rules based on the default vocabulary.

Edit: replaced preserve semantics with keep semantics

pchampin commented 3 years ago

@about-code The mechanism of typed-scoped contexts is indeed intended to solve your use case.

But, well, that doesn't look compelling and soon becomes verbose if a data model is a bit more involved. There should be some option and algorithm to automatically create a type-scoped context for each type and apply type-dependent IRI expansion rules based on the default vocabulary.

I am not sure what you mean by "apply type-dependent IRI expansion rules". Do you mean some kind of IRI templating?

Is your OO data-model described in some machine-readable form? I guess you could use this to generate the verbose JSON-LD context.

about-code commented 3 years ago

I didn't want to mix the problem statement with a premature solution proposal. But indeed I have thought of some kind of IRI templating, too. But I have thought of ways that work only by appending, as well.

Type-dependent IRI expansion

By IRI expansion in general I refer to what JSON-LD parsers do when mapping "shortcut terms" like type names and JSON attributes to some @vocab-IRI. They basically expand the IRI by appending the term to the vocabulary IRI.

By type-dependent IRI expansion I refer to some yet to be defined algorithm which takes @type into account when mapping property terms onto IRIs to keep semantic integrity when data is mapped onto a data graph without a lot explicit mappings being declared.

Top-down vs. bottom-up?

Let me try to explain the use case with a top down vs. bottom-up analogy:

In a top down approach I would probably choose some (likely public) shared vocabulary such as schema.org. I were to accept that any property semantics exist independent of my particular OOP object model and every term in the public vocab has got its IRI already. I have to figure out how the terms of the public vocabulary fit with the terms in my object model.

In a bottom-up approach, I would like to go the other way around. I have an object model and want to construct IRIs for the terms of that model. I may like to map the object model onto a data graph in order to ingest it into a triple store. To do so in a semantically consistent way I need to take into account how the OOP type affects property semantics. I must ensure that a property name of a type Company is not going to be mapped onto the same IRI as the name of a type Employee to keep semantic integrity.

Unfortunately, with JSON-LD a bottom-up approach seems to be quite difficult today. Or put the other way around: it seems too easy to me to end up loading rubbish into a triple store because of JSON-LD parsers being not really type aware. Having said this, I am not asking to change any of the existing processing models.

Rather I'd like to have a discussion whether there's some common ground and possibility to have additional rules / options for IRI expansion without breaking any existing conceptual boundaries.

Requirements

It doesn't have to work magically. It should just

Solution ideas (rough sketches)

  1. IRI templating might be part of the solution
  2. Modified rules to append terms might be sufficient
    • there could be a flag that when true causes any property term being appended to the IRI of @type that is a closest sibling or parent. Eventually it falls back to the @vocab vocabulary. So there might be something like the "active type", similar to the "active context".
about-code commented 2 years ago

As an addendum: I recently found a technical report from the Open Geospatial Consortium as part of their TestBed-14 which describes pretty well a use case and workflow I had in mind when referring to a bottom-up approach:

6.2.3.2. Purpose built intermediate ontology

When JSON data shall be converted to RDF data using a JSON-LD @context and the JSON-LD to RDF serialization, then structural differences can be avoided by mapping to an ontology that fits the structure of the JSON data. Such an ontology may need to be created first. Turning the resulting RDF data into NEO compliant data would then require a mapping between the two ontologies. However, that work would only require RDF(S)/OWL tools;

Disclosure: I am not related to OGC and writing this independently.

TallTed commented 2 years ago

@about-code -- Please codefence the @context in the quoted text in your comment, by wrapping it in backticks, as shown here — `@context` — so that user doesn't get pinged about this issue, about which they probably don't care.

about-code commented 2 years ago

A thing I have not yet thought well enough of, initially, is the following:

Given @type were a multi-valued array like e.g. "@type": ["Person", "Employee"] which might be the case

then I see no other way than using the algorithms and means that already exist in JSON-LD 1.1 as of today, namely type-scoped contexts and/or prefixes (see example, below).

Regarding my initial post:

There should be some option and algorithm to automatically create a type-scoped context for each type and apply type-dependent IRI expansion rules

a first conclusion I would draw for myself is that the kind of algorithm I imagined, could only be applied to JSON documents / JSON-LD graphs which adhere to a particular shape, namely a shape which permits only a single @type value.

Applying particular processing rules for particular graph shapes seems to me like introducing an additional profile (?) to JSON-LD. At least it required some concept of declaring one or more shapes which the LD document could be validated against, the profile's shape being one of it.


Addendum 1: I could not tell for myself whether this were a good or bad conceptual addition (keeping "amount of work" out for a moment). In general there are quite a few other complexities rooted in RDF (Open-World) vs. OOP (Closed-World) semantics. So, if the outcome were kind of a "simplified profile", maybe it were even worth the effort.

Addendum 2: Nothing new: prefixes required in case of @type being an array

{
  "@context": {
    "comp": "https://data.my.org/vocab/company/",
    "empl": "https://data.my.org/vocab/employee/",
    "pers": "https://data.my.org/vocab/person/",
    "Company": {
      "@id": "comp:Company",
      "@context": {
        "@vocab": "comp",
        "@propagate": true
      }
    },
    "Employee": {
      "@id": "empl:Employee",
      "@context": {
        "@vocab": "empl",
        "@propagate": true
      }
    },
    "Person": {
      "@id": "pers:Person",
      "@context": {
        "@vocab": "person",
        "@propagate": true
      }
    }
  },
  "@id": "123-123",
  "@type": "Company",
  "name": "Tesla",
  "people": [
    {
      "@id": "123-124",
      "@type": ["Person", "Employee"],
      "empl:entryDate": "2008-02-01",
      "pers:name": "Nikola",
      "pers:surname": "Alset"
    }
  ]
}
pchampin commented 2 years ago

a first conclusion I would draw for myself is that the kind of algorithm I imagined, could only be applied to JSON documents / JSON-LD graphs which adhere to a particular shape, namely a shape which permits only a single @type value.

Why so? You can make this work with multiple types by being a little more targeted in your type-scoped contexts. I.e. avoid the catch-all @vocab and define only the properties defined for each class:

  "@context": {
    "comp": "https://data.my.org/vocab/company/",
    "empl": "https://data.my.org/vocab/employee/",
    "pers": "https://data.my.org/vocab/person/",
    "Company": {
      "@id": "comp:Company",
      "@context": {
        "name": "comp:name",
        "people": "empl:people"

      }
    },
    "Employee": {
      "@id": "empl:Employee",
      "@context": {
        "entryDate": "empl:entryDate"
      }
    },
    "Person": {
      "@id": "pers:Person",
      "@context": {
        "name": "pers:name",
        "surname": "pers:surname"
      }
    }
  },
  "@id": "123-123",
  "@type": "Company",
  "name": "Tesla",
  "people": [
    {
      "@id": "123-124",
      "@type": ["Person", "Employee"],
      "entryDate": "2008-02-01",
      "name": "Nikola",
      "surname": "Alset"
    }
  ]
}

(test it in the playground)

Note that a type-scoped context for a subclass could also include properties of the super class, so that data does not have to contain the redundant information that each employee is also a person (assuming that the application "knows" that the former is a subclass of the latter).

  "@context": {
    "comp": "https://data.my.org/vocab/company/",
    "empl": "https://data.my.org/vocab/employee/",
    "pers": "https://data.my.org/vocab/person/",
    "Company": {
      "@id": "comp:Company",
      "@context": {
        "name": "comp:name",
        "people": "empl:people"

      }
    },
    "Employee": {
      "@id": "empl:Employee",
      "@context": {
        "entryDate": "empl:entryDate",
        "name": "pers:name",
        "surname": "pers:surname"
      }
    },
    "Person": {
      "@id": "pers:Person",
      "@context": {
        "name": "pers:name",
        "surname": "pers:surname"
      }
    }
  },
  "@id": "123-123",
  "@type": "Company",
  "name": "Tesla",
  "people": [
    {
      "@id": "123-124",
      "@type": "Employee",
      "entryDate": "2008-02-01",
      "name": "Nikola",
      "surname": "Alset"
    }
  ]
}

(try it in the playground)

pchampin commented 2 years ago

Applying particular processing rules for particular graph shapes seems to me like introducing an additional profile (?) to JSON-LD. At least it required some concept of declaring one or more shapes which the LD document could be validated against, the profile's shape being one of it.

Why not reuse something like JSON-schema for that?

about-code commented 2 years ago

(1)

Why so? You can make this work with multiple types by being a little more targeted in your type-scoped contexts.

No doubts, the concept of type-scoped contexts is flexible enough here as you proved (already agreed on that in the OP above). Just note, that my thoughts are not headed towards crafting type-scoped contexts but about a JSON-LD parser being able to algorithmically infer type-scoped contexts from the type information in the data (and some less verbose @context metadata). My goal were to relief an average developer from crafting type-scoped contexts, yet ensure property URIs were inferred in a way which respect the implicit semantic binding between an OOP type and its properties. Also remind the bottom-up analogy above.

So my conclusion was drawn from the opposite perspective of yours and that in case of a multi-valued type, an algorithm can not decide which of the (prefix-less) properties belong to a particular type-scope. In such a case, a type-scoped context still must be crafted the way you did (or the properties do require a prefix giving a hint on the right type-scope, like I did).

(2)

Applying particular processing rules for particular graph shapes seems to me like introducing an additional profile (?) to JSON-LD. At least it required some concept of declaring one or more shapes which the LD document could be validated against, the profile's shape being one of it.

Why not reuse something like JSON-schema for that?

Great question. So far, I imagined some (for this purpose enhanced) JSON-LD parser being in charge of

Conceptually and superficially, I consider JSON-Schema and SHACL to be different dialects for the same kind of restrictions/closing of an open world ;-). However, SHACL requires to parse a JSON document into an RDF graph, first, before validating the graph shape. This doesn't seem ideal, particularly if constructing the graph structure and assigning/resolving URIs to graph nodes are one and the same thing. So your question puts me very much in favor of JSON-Schema, actually.. hmm...

pchampin commented 2 years ago

about JSON-Schema and SHACL, I totally agree: they play basically the same role, but on different levels / data-models.

pchampin commented 2 years ago

My goal were to relief an average developer from crafting type-scoped contexts, yet ensure property URIs were inferred in a way which respect the implicit semantic binding between an OOP type and its properties.

JSON-LD processors themselves are not designed to do that kind of inference. However, average developers do not need to craft these type-scoped contexts manually either. One could imagine a tool that would generate the context from a specification of the OO model. For example, I have worked recently on such a tool, taking a UML class diagram in XMI format, and generating a JSON-LD context.

an algorithm can not decide which of the (prefix-less) properties belong to a particular type-scope

If an OO model was designed in such a way that it could not be determined algorithmically which property relates to which class, then this OO model would be seriously flawed, wouldn't it?

I am not sure that this is true. If that was true, then how would a compiler

about-code commented 2 years ago

an algorithm can not decide which of the (prefix-less) properties belong to a particular type-scope [...] If that was true, then how would a compiler

It's not about a compiler but a JSON-LD processor. How could the playground's JSON-LD processor decide which property belongs to which type without the metadata you provided? It couldn't.

My goal were to relief an average developer from crafting type-scoped contexts, yet ensure property URIs were inferred in a way which respect the implicit semantic binding between an OOP type and its properties.

JSON-LD processors themselves are not designed to do that kind of inference.

Maybe "inference" was a bit misleading on my side, because the issue is not really about infering new facts. It is also not just about crafting things, because even when generated by tools, type-scoped contexts add a lot metadata. But the semantic relationship between an attribute and a class, which I outlined initially, is common to every object-oriented programming language.

The point I try to make:

{
  "@context": {
    "@expandProfile": "https://not-yet-w3.org/json-ld/expansion/profile/objects-to-rdf
    "@vocab": "https://data.my.org/vocab/"
  },
  "@id": "123-123",
  "@type": "Company",
  "name": "Tesla",
  "people": [
    {
      "@id": "123-124",
      "@type": "Employee",
      "name": "Nikola"
    }
  ]
}

should be enough to result in something like this (using a result after compaction with {} for readability):

{
  "@id": "123-123",
  "@type": "https://data.my.org/vocab/Company",
  "https://data.my.org/vocab/company/name": "Tesla",
  "https://data.my.org/vocab/company/people": {
    "@id": "123-124",
    "@type": "https://data.my.org/vocab/Employee",
    "https://data.my.org/vocab/employee/name": "Nikola"
  }
}

where property IRIs differ by the type they are used with. @expandProfile, as an example, informs a JSON-LD processor about the alternate URI expansion algorithm to apply. For the algorithm some behavior could be defined as follows:

This may be implemented by the JSON-LD processor setting up a type-scoped context within the currently active @context for a @type field it encounters (my initial examples may scetch a possible template for the type-scoped context here).

pchampin commented 2 years ago

@about-code sorry my previous message was sent in an incomplete state (obviously), and sorry it took me so long to respond.

Looking back at this, I think that we agree that what you want to achieve is already possible in JSON-LD, but us tedious and could benefit from some automation. We diverge on where we want to put this automation:

about-code commented 2 years ago

Don't mind your late response, I won't be able to respond timely for the next two weeks to come, either. It's been a good summary of - well, I would choose to say architectural options (rather than preferences) - that we bring in. I see Pros & Cons for both.

Having said this, I would be more than happy if the outcome of this issue could be used for or serve as an Architecture Decision Record which helps understanding consequences of one over the other (for JSON-LD Processors but also for upstream users/consumers of JSON-LD and with respect to, let's say ISO 25010 qualities of systems exchanging JSON-LD data.