w3c / json-ld-syntax

JSON-LD 1.1 Specification
https://w3c.github.io/json-ld-syntax/
Other
112 stars 22 forks source link

Node Types in @context #76

Closed AtesComp closed 5 years ago

AtesComp commented 5 years ago

So, reading the specification, I see the the line: "Specifically, @type cannot be used in a context to define a node's type." where I ask, WTF? Why not? Why so short-sighted? Why do node types need to be red-headed stepchildren?

Given conversions from JSON to JSON-LD , it would be extremely useful to apply type (coercion) to IRIs and blank nodes via @context. Example:

Given

{
  "type": "indicator",
  "id": "indicator--e2e1a340-4415-4ba8-9671-f7343fbf0836",
  "external_references": [
    {
      "source_name": "veris",
      "external_id": "0001AA7F-C601-424A-B2B8-BE6C9F5164E7",
      "url": "https://github.com/vz-risk/VCDB/blob/master/data/json/0001AA7F-C601-424A-B2B8-BE6C9F5164E7.json"
    }
  ]
}

I can header the data with a context:

  "@context": {
    "@version": 1.1,
    "@base": "http://purl.org/cyber/stix/identifier/",
    "@vocab": "http://purl.org/cyber/stix/vocab/",

    "stix": "http://purl.org/cyber/stix#",
    "id": "@id",
    "type": "@type",
    "indicator": "stix:Indicator",
    "external_references": {
      "@type": "@id",
      "@context": {
        "url": { "@type": "@id" }
      }
    }
  },

which converts the data somewhat nicely. Pop it into the JSON-LD 1.1 Playground and view the table. I've severely limited the example for clarity.

Generally the properties "external_references", "url", and others would have ontology definitions to cast the un-typed nodes. However, this is not always the case. As an example, let's assume "external_references" does not cast to a single definitive node type. The "external_references" structure will produce a blank node without type. We have applications where a property key name may be used in different contexts (thanks for scoped @context !!!) and therefore may indicate different node types. An ontology can define multiple types for a given property, but its not definitive. The data may specify the definitive node type by context. Without a method to type nodes in context, the specification severely limits the usefulness of the conversion.

SUGGESTION:

Since "@type" allows for all manner of data typing, why not allow "@type" to also provide node typing. It seems a simple addition to the specification, such as:

  "@type": { "@id": "http://my.friggin.org/has/class" }

where we specify the context is a node, as before, with the addition of the node type(s) [sure, more than one node type using "@id": { "nt1", "nt2" } or maybe multiple "@type" statements, why not?]. I'm not saying we NEED multiple types for the specification. Allowing for just one node type would be monumental. In fact,

  "@type": { "http://my.friggin.org/has/class" }

might be enough or a short form as it might imply "@id". Or maybe add a specification for value types as well (just to be thoughtful about values):

  "@type": { "@value": "http://my.friggin.org/has/valueType" }

Ahh, uniformity! So, here are the different context types:

  "@type": valuetype
  "@type": { "@value": valuetype }
  "@type": "@id"
  "@type": { "@id": nodetype }

NOTE: There was some discussion about allowing the value to be both a node type and a data type. There is something very wrong about that kind of data pattern.

The above context would be modified as follows:

  "@context": {
    "@version": 1.1,
    "@base": "http://purl.org/cyber/stix/identifier/",
    "@vocab": "http://purl.org/cyber/stix/vocab/",

    "stix": "http://purl.org/cyber/stix#",
    "id": "@id",
    "type": "@type",
    "indicator": "stix:Indicator",
    "external_references": {
      "@type": { "@id": "stix:ExternalReference" },
      "@context": {
        "url": { "@type": {"@id": "stix:URL"} }
      }
    }
  },

Now, I could query for all things "stix:ExternalReference" or "stix:URL" without resorting to a bastardized conversion therapy.

gkellogg commented 5 years ago

There was a proposal to do something like this in json-ld/json-ld.org#426, which went unsupported. The wording for @type for nodes is there because of common confusion, but the ability to add things to the graph based on information in the context was unsupported. This was considered again, in a different light, in #19 (see discussion here). Basically, the feeling is that the context should not be used to add information to the graph. If the context were to change, then the content of the associated JSON-LD document could have a radically different meaning.

For approaches to change the content of a linked data graph, framing is appropriate. Here, a frame could be created that created a default for @type to do what you suggest.

azaroth42 commented 5 years ago

Agree with Gregg that data transformation (in this case adding an rdf:type assertion) is not the job of the context. For precedent, consider #7 and #15.

Propose close wontfix.

ajs6f commented 5 years ago

👍 to wontfix.

AtesComp commented 5 years ago

The argument against seem pedantic. Type is type is type...whether it is applied to simple values or nodes. The argument against seem to want to throw out everything we've learned about computer science and substitute some other modality for type: "@type" in JSON-LD should represent something other that "type" in every other well formed language. Typing something as "integer" is no different than typing something as "widget". If adding information to the graph is the issue, then the argument is entirely hypocritical as I present below. Adding type information to data values IS adding information. Information is added by simply appending @context. In all instances I've seen @context used, it is at least transformative which implies some application of knowledge not necessarily present in the raw data, i.e., added information...by the consumer.

Confusing the issue with RDF (or any other graph specification) and the fact that adding information for type on a node (versus a value) creates an extra triple should have never been considered as many discussions specify JSON-LD and RDF are separate issues. In fact, JSON-LD will add RDF triples in the form of blank node allocations via "@container": "@list" and other constructs--that didn't seem to bother anybody. A list implies the blank nodes are typed as rdf:List and adds rdf:first, rdf:rest, and rdf:nil relate triples. The fact that consuming a JSON-LD serialization of data should be somehow limit the typing of that data seems entirely contrary to developing JSON-LD in the first place. Type gives meaning to data in all aspects of Computer Science. Why should JSON-LD be any different?

So, if this false flag argument is indeed the issue, then let us consider how @type also affects data typing in the aspect of "information added". When I type a value given as "Snoopy" via @context with "xsd:string", the consuming RDF system may simply transform the value to "Snoopy"^^xsd:string. The "xsd:string" could have been any type, such as "myComplexType". The point is that any data type uses a class ontology in RDF to represent type--the traditional notion of Class, Object, and Type. Let us add a language element "@en" in a @context via "@language": "en". Now "Snoopy" is "Snoopy"@en. In either case, have we not explicitly added information?

In fact, depending on the RDF consumer, the ingestion may indeed add triples to represent data types and language. While RDF 1.1 specifies the use of data type and language tags, there is no imperative to force literals to hold data type and language:

(node1) -- hasDog --> "Snoopy"@en

can be represented as

(node1) -- hasDog --> (_:bn) -- rdf:type --> (DogClass)
                         | -- value --> "Snoopy"
                         |--> language --> "en"
                         `--> datatype --> (xsd:langString)

or

(node1) -- hasDog --> "Snoopy"
(_:s) -- rdf:type --> (rdf:Statement)
  |--> rdf:subject --> (node1)
  |--> rdf:predicate --> (hasDog)
  |--> rdf:object --> "Snoopy"
  |--> language --> "en"
  `--> datatype --> (xsd:langString)

Adding information (as triples, reification, or otherwise) is not and never should have been an issue. Depending on the storage mechanism, the consumer is free to store data how they see fit. Then, @context either helps a consumer with that objective or not. If not, then JSON-LD is lacking--I might as well use JSON and construct independent transformations.

You stated:

If the context were to change, then the content of the associated JSON-LD document could have a radically different meaning.

Yes. And? Since the consumer is free to add and manipulate the context, it should be up to the consumer to use the data however they define the context. If one consumer uses "xsd:string" for values and another uses "widget", then the associated JSON-LD document could have a radically different meaning. JSON-LD already allows that and its up to the consumer to decide. In the face of a producer lacking type on provided data, what is the consumer to do? Answer: impose type as they see fit. If adding type to a node requires a triple in RDF, so what? In another graph specification, it may not. With RDF data, I can transform the document to add missing node types, so why does JSON-LD resist?

My issue is that we are transforming JSON to JSON-LD via @context. Since the data is raw JSON, it is wholly expected that adding information will be required to "make sense" out of the data to support Linked Data. I cannot rely on the producer to conform to a Frame definition. In fact, the Frame specification seems to require @type pre-specified in the input in order to match the Frame definitions for output. I see no example where @type is added due to Frame processing.

FYI: At https://www.w3.org/2018/jsonld-cg-reports/json-ld-framing/#framing Example 8 looks like it needs review and update.

So, even after reading the provided info on @type and Frames, I'm still at a loss to understand the contrary viewpoint. It seems to me that Frames are specifying a context that produces a transform of typed nodes and their relations--an important issue, but misses the point on the "type" issue. If Frame processing does allow me to assign @type to a node, then allowing @context to add @type to a node should not be an issue either.

For Frames, how do we generally add type for untyped nodes when it relies on @type to map input to output?

azaroth42 commented 5 years ago

Example 8 looks correct to me. Could you be more explicit about the issue that you're seeing please?

For the question on how to add data via frames, please have a look at the use of @default

Adding arbitrary data has been considered problematic due to the automatic inclusion of referenced contexts over the network, that open the door for attacks via data injection. Imagine if your fridge resolved a context that had been hijacked and used it to inject new malicious services or display spam as warnings.

The data types and languages are to achieve parity with other RDF syntaxes -- JSON doesn't support @en or ^^xsd:string and hence needs accommodation. Rather than force every langString to be explicit, the convenience added in JSON-LD 1.0 is to put it in the context. That does not open the door for arbitrary triples, nor does the syntax parity for rdf:List. Note, please, that other ordering syntaxes were rejected in #15 as out of scope.

BigBlueHat commented 5 years ago

The security concerns @azaroth42 notes are the key constraint on using @context (since those can be remotely referenced) to expand meaning rather than simply describe it.

Does Framing solve for your use case adequately? If not, that may be the better spec to target for feature/fix requests: https://github.com/w3c/json-ld-framing/issues

Thanks for raising these questions regardless!

iherman commented 5 years ago

This issue was discussed in a meeting.

AtesComp commented 5 years ago

As a final response, I must point out that the issue for adding a triple for describing a object's type is not "arbitrarily adding triples" (sub, pred, obj) as they are restricted to a given subject and a controlled predicate (type). The only arbitrary point is the object which must be interpreted as a class. This in no way exposes a system to adding arbitrary ontological statements or any other statements for that matter. Type's purpose is to describe meaning, not expand it. Untyped data is the definition of meaningless.

As for the security concerns and opening attacks via data injection, I'll point out that those types of attacks would be most likely reserved to the "data", i.e., the literals, since a system would rely on the literals for command and control. In other words, the actor would be providing (injecting) the JSON data directly. However, manipulating type would be exceptionally harder and problematic in a hacking context as an actor usually relies on established type to inject manipulative data in a manner that suits the objective. Changing type to something the actor prefers would suggest that an actor 1) has other intimate knowledge of the system and 2) would also be able to inject ontology modification or additions to employ the ability to set type (you are likely already PWNED). This implies an actor would be able to import ontology somehow. If the actor can do that, way more damaging things can be done. Nothing about the issue proposed suggest that this would be the case.

As such, @context allows for a bad actor to manipulate a literal's datatype class...this door is open. So without a deeper discussion on what constitutes a security concern for adding type to objects and a specific context for any attack on the model, the argument amounts to unjustified FUD. Provide a discussion in another issue that demonstrates an attack pattern using this model. I would certainly like to read and discuss actual attack vectors via this process to secure my own implementations. Anyone want to pick up a Blackhat presentation on hacking knowledge graphs via JSON-LD?

To address parity, I must then ask why rdf:List was used (a collection) as opposed to an rdf:Bag (a container)? I can see pros and cons for both, but in the absence of definitive JSON construct, it seems that an rdf:Bag (unordered array) would be closer to the represented data than a rdf:List. Since blank nodes are created for several of the JSON-LD constructs, typing them (via @set or @list, maybe?) would be very useful (rdf:Seq, rdf:Bag, rdf:Alt, or other defined class). Maybe there could be a setting allowing the consumer to choose between such constructs.

I do accept that Framing may fit the need. If JSON-LD requires a producer to explicitly supply type for sharing, then they must also have intimate domain knowledge for how the consumer uses the data. Otherwise, a two stage ingestion is required by the consumer. If Framing is that process, so be it. It would be nice to see examples on how @type might be added via Framing in the Framing documentation. I understand the use of @default, but this also requires other matching criteria to apply @type. Using Framing, how would @type be applied to generated blank nodes? These are the basic questions inquiring minds want to know.

Thanks for your time.

lukasheinrich commented 5 years ago

just for completeness, this seems like a similar point as #31 iiuc

azaroth42 commented 5 years ago

Closing, won't fix. Transformation is out of scope for the WG. The issue was pushed to the Community Group last year (though without any movement).

AtesComp commented 5 years ago

The argument against applying object type relies of the premise that "changing the meaning" of some resource should not be allowed. The argument for is uninterested in "changing meaning", but "adding meaning". When type is not given, the consumer has no recourse other than implied type via range specifications in an ontology. By defining an ontology rule for a property, type may be inferred through domain and range. Inferencing is a step-wise workaround for lack of a conventional typing mechanism. Direct type definition is preferred. When type is given, adding additional type does detract from original type.

Adding type should not replace existing type, but simply add additional type to a value. However, the specification AS IS does nothing to prevent a context from changing type. Example:

{
  "@context": {
    "@version": 1.1,

    "http://original.com/type": "http://my.com/type"

  },

  "@id": "http://original.com/resource",
  "@type": "http://original.com/type"
}

This results in the expanded form:

[
  {
    "@id": "http://original.com/resource",
    "@type": [
      "http://my.com/type"
    ]
  }
]

All arguments against are moot. This proves that the argument against "a consumer can then openly share the data with different meaning" is without merit. They can anyway and at will whether they use a JSON-LD context or not! JSON-LD has no way of enforcing what a consumer does with the data after applying context--local processing is beyond the scope of JSON-LD. Additionally, a producer may be interested in providing a context for consumers. Adding resource type is a viable and reasonable solution for a producer that gives the consumer an option to either use the raw JSON or apply JSON-LD context. This in no way forces a consumer to use that context, but simply allows the producer to provide intended type without resorting to deeper internal modification.

Framing fails in all instances. Nodes without type fail to match a framing specification. See 4.2.1 Framing Requirements:

Values of members in a frame object that are not keyword MAY also include a default object. Values of @default MAY include the value @null, or an array containing only @null, in addition to other values allowed in the grammar for values of member keys expanding to absolute IRIs. Processors MUST preserve this value when expanding. All other members of a default object MUST be ignored.

Unfortunately, the "are not keyword" part eliminates the use of framing for applying @type.

In conclusion, the JSON-LD specification limits both producers and consumers of JSON data from properly applying type directly. Asking a producer of JSON to change an established specification is untenable. Specifications already often document type via key definitions in many cases. Producers cannot be expected to construct specific JSON-LD solutions to satisfy a consumer. Allowing @context to add type to a resource is the only viable solution short of designing specialized use case solutions or forking JSON-LD.

This issue will continue to bite at the heels of JSON-LD.

Also, it would be nice to convert whitespace to underscore when applying "@type":"@id" or "@type":"@vocab" to the value (like OpenRefine).

gkellogg commented 5 years ago

I could see us providing for @default on @type when framing. It seems like a reasonable expectation. It, JSON-LD is not intended to be a general purpose query engine, and somethings may always require SPARQL construct or some similar graphQL solution.

AtesComp commented 5 years ago

UPDATED (I shouldn't comment when not feeling well).

I'll accept that a frame solution may be the answer. The current playground breaks. I've read the working framing document and it's a vast improvement. It's starting to make sense.

lsimichael commented 4 years ago

Hi, I think I'm having a similar issue.

I'm mostly familiar with framing but haven't been able to accomplish the following:

For each owl:Class in my ontology, I have a property array of strings, which reverse maps to rdfs:domain. I want each of these property nodes to have the type owl:ObjectProperty by default.

{
   "@id": "User",
    "@type": "owl:Class",
    "property": [
      "group",
      "role"
    ]
}

via @gkellogg :

Early in the thread I see:

For approaches to change the content of a linked data graph, framing is appropriate. Here, a frame could be created that created a default for @type to do what you suggest.

This seems like the appropriate solution for my problem but I haven't been able to figure out the syntax.

Later in the thread I see:

I could see us providing for @default on @type when framing. It seems like a reasonable expectation.

So, is this not currently possible with framing? I do think it would be highly useful.

gkellogg commented 4 years ago

If you had the following input:

{
  "@context": {
    "@vocab": "http://example.org/",
    "owl": "http://www.w3.org/2002/07/owl#",
    "property": {"@id": "owl:property", "@type": "@vocab"}
   },
   "@id": "User",
   "@type": "owl:Class",
   "property": [
    "group",
    "role"
  ]
}

You could use the following frame:

{
  "@context": {
    "@vocab": "http://example.org/",
    "owl": "http://www.w3.org/2002/07/owl#",
    "property": {"@id": "owl:property", "@type": "@vocab"}
   },
  "@type": "owl:Class",
  "property": {
    "@type": {"@default": "owl:Property"}
  }
}

which would add the default @type to get the following result:

{
  "@context": {
    "@vocab": "http://example.org/",
    "owl": "http://www.w3.org/2002/07/owl#",
    "property": {
      "@id": "owl:property",
      "@type": "@vocab"
    }
  },
  "@id": "User",
  "@type": "owl:Class",
  "property": [{
    "@id": "http://example.org/group",
    "@type": "owl:Property"
  }, {
    "@id": "http://example.org/role",
    "@type": "owl:Property"
  }]
}

It won't work on the playground yet, because that support hasn't yet been added, but you can try it on the Ruby Distiller.

lsimichael commented 4 years ago

OK great! I was using playground. I see it working with the alternate link. Thank you @gkellogg