Ensure that blank node identifiers for anonymous graphs are reused

gkellogg commented 6 years ago

From https://github.com/w3c/json-ld-syntax/issues/30#issuecomment-409994489, @ericprud notes the problem with using ShEx, or anything else, to match the content of a named graph with only blank node subjects. Consider the following JSON-LD (from expansion test 0079):

{
  "@context": {
    "@version": 1.1,
    "input": {"@id": "foo:input", "@container": "@graph"},
    "value": "foo:value"
  },
  "input": {
    "value": "x"
  }
}

Currently, this will generate TriG similar to the following:

@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

 [ <foo:input> _:b1] .

_:b1 {
   [ <foo:value> "x"] .
}

and expanded JSON-LD:

[{
  "foo:input": [{
    "@graph": [{
      "foo:value": [{"@value": "x"}]
    }]
  }]
}]

Following the link from _:b1 as an object to the graph using that name is feasible, but finding an unnamed subject within that graph can't really be done, for any reasonably complex named graph.

This proposal would cause the expansion algorithm to re-use the blank-node identifier naming the graph for the implicitly named subject contained within the graph, generating the following TriG:

@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

 [ <foo:input> _:b1] .

_:b1 {
   _:b1 <foo:value> "x" .
}

This makes it possible to follow the chain from the object identifying the graph to the primary subject of that graph. Provisions must be made for forms in which there are multiple unnamed subjects within the named graph.

iherman commented 6 years ago

I do not really follow this. The two TriG snippets are different through the reuse of a bnode.

The whole area of RDF Datasets semantics is a bit murky, and there was never a full consensus on the details. But that section clearly says, for example, that:

The graphs in a single dataset may share blank nodes.

ie, a single graph does not constitute some sort of a separate "namespace" (for the lack of a better word).

gkellogg commented 6 years ago

This is. It really about dataset semantics, but a practical solution to a narrow use case. Through framing, JSON-LD encourages that graph names be the object of some triple in another (usually defaultj graph. Not RDF semantics, but a JSON-LD bias. Verifiable Claims. Ames use of such a representation. When liking for how to find a shape that matches the content of a named graph, where the subject is a blank node, it helps if it is the same blank node used to name the graph. This is really something coming from @ericprud, and maybe he can comment further.

iherman commented 6 years ago

I do not really understand the issue, maybe indeed @ericprud can tell further but, regardless, I do not think it is a good idea if the expansion algorithm generates a semantically different RDF dataset than what the original syntax defines. My feeling is that this is exactly what would happen here.

ericprud commented 6 years ago

I think we're defining the syntax now so we can't really deviate from "what the original syntax defines." I am, however, keenly interested in making sure the RDF graph is as closely aligned with the JSON tree as possible. Consider JS accessing the data in (a superset of) Gregg's example:

V = {
  "input": {
    "bar": "a",
    "value": "x",
    "baz": { "value": "y" }
  }
}

The JSON tree provides direct access to "x" via the path V.input.value. Finding the same info in the RDF graph implied by the @graph requires both the ability to search the graph and prescient knowledge of which bnodes were generated for the "input":{} object:

_:b0 foo:input _:b1 .

_:b1 {
   _:b2 foo:bar "a" .
   _:b2 foo:value "x" .
   _:b2 foo:baz _:b3 .
   _:b3 foo:value "y" .
}

There's no way to know that input's value is "x" and not "y" without presuming that JSON-LD only does forward arcs (now and in perpetuity) and doing an exhaustive search for the triple with a foo:value predicate and no incoming arcs. Doing so would not only make the RDF graph much less attractive to work with, it would paint future versions of JSON-LD into a corner. If, however, the graph node were used as the subject:

_:b0 foo:input _:b1 .

_:b1 {
   _:b1 foo:bar "a" .
   _:b1 foo:value "x" .
   _:b1 foo:baz _:b2 .
   _:b2 foo:value "y" .
}

the connection from foo:input to "x" would be easy to navigate in e.g. SPARQL:

SELECT ?value WHERE {
  [] foo:input ?g
  GRAPH ?g { ?g foo:value ?value }
}

I'm not particularly in love with the idea of _:b1 having dual use as a node and a graph name but it seems less controversial than the alternatives, all of which involve creating extra triples:

root connector rdf:rootNode in referring graph:

_:b0 foo:input _:b1 .
_:b0 rdf:rootNode _:b2 . # <-- root connector

_:b1 {
   _:b2 foo:bar "a" .
   _:b2 foo:value "x" .
   _:b2 foo:baz _:b3 .
   _:b3 foo:value "y" .
}

root annotation rdf:nodeRole in referenced graph:

_:b0 foo:input _:b1 .

_:b1 {
   _:b2 rdf:nodeRole rdf:TreeRoot . # <-- root annotation
   _:b2 foo:bar "a" .
   _:b2 foo:value "x" .
   _:b2 foo:baz _:b3 .
   _:b3 foo:value "y" .
}

gkellogg commented 6 years ago

When we discuss this, I think it would be important to have @dlongley (and/or @msporny) and @ericprud on the call, as they are most informed about the use case.

iherman commented 6 years ago

I do not think that @ericprud's comment answered my concern:

I do not think it is a good idea if the expansion algorithm generates a semantically different RDF dataset than what the original syntax defines. My feeling is that this is exactly what would happen here.

ericprud commented 6 years ago

I think you're arguing for the proposal then. Let's examine this without the , "@container": "@graph":

{
  "@context": {
    "@version": 1.1,
    "input": {"@id": "foo:input"},
    "value": "foo:value"
  },
  "input": {
    "value": "x"
  }
}

yields

_:b0 foo:input _:b1 .
_:b1 foo:value "x" .

I.e. the object of foo:input is the subject of foo:value. Without this proposal, they are completely disconnected in the graph implied by "@container": "@graph":

_:b0 foo:input _:b1 .
_:b2 foo:value "x" _:b1 .

To me, that seems like "semantically different RDF" -- what was once navigable by a path now requires a heuristic search (something involving finding a node with no incoming arcs, unless you have inverse properties in the @context, in which case, good luck). With this proposal, they are once again connected, even though the 2nd triple is in another graph:

_:b0 foo:input _:b1 .
_:b1 foo:value "x" _:b1 .

gkellogg commented 6 years ago

@iherman There is no conflict with 1.0, as there was no notion of implicitly defined named graphs. The "@container": "@graph" creates this possibility 1.1 to make connected JSON work and have meaning as RDF.

The fundamental issue with shapes, is if you can identify the graph name as it is the object of a triple in the default graph, that says nothing about any shape contained within that graph.

By re-purposing the graph name as the default subject of that named graph we run the risk of conflating the meaning of that identifier: does it name a graph or does it name a subject within the graph. But, for practical purposes, it makes sense to do this to be able to follow a chain from the default graph, through an identified graph name, and to a subject within that graph.

The alternative for ShEx would be to use some other properties of that graph to hook up the shape, for example, find the subject within that graph which is not also an object in that graph, but this could be convoluted and expensive for large graphs, and does not take into consideration the possibility of reverse properties used within the JSON-LD serialization.

In summary, the proposal solves a problem that exists in the real world at the expense of some blank node identifier semantics.

iherman commented 6 years ago

I am sorry, maybe I am rusty with my RDF. In my reading,

{
  "@context": {
    "@version": 1.1,
    "input": {"@id": "foo:input", "@container": "@graph"},
    "value": "foo:value"
  },
  "input": {
    "value": "x"
  }
}

Must yield:

@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

[ <foo:input> _:b1] .

_:b1 {
   [ <foo:value> "x"] .
}

or, to make it into n-Quads

_:b0 <foo:input> _:b1 .
_:b2 <foo:value> "x" _:b1 .

This is what happens today. The object foo:input is a graph (that is what "@container": "@graph" means after all, right?) and this does not mean it is (also) the subject of a triple within that graph. These are two different things.

In other words,

_:b0 <foo:input> _:b1 .
_:b1 <foo:value> "x" _:b1 .

is, imho, plainly wrong.

We seem to be in an impass; maybe we should try to ask the opinion of another RDF expert...

gkellogg commented 6 years ago

You're correct that that's what the spec says now, the proposal is to change this meaning. As this is entirely new behavior, there is no real compatibility issue. It is most useful if the default subject of the graph is the same as the graph name, for the purposes of shape matching, anyway, but in general traversing between graphs.

iherman commented 6 years ago

If I change this meaning I create a backward incompatible version of JSON-LD. Didn't we say that is a big no-no?

We can introduce a new type of @container (I am not sure I would like this) which has an additional semantics. I am not sure how one would define that formally, and I think it would add to the general confusions...

gkellogg commented 6 years ago

JSON-LD 1.0 had no graph containers, so there is no backwards compatibility issue. In 1.0, all named graphs were explicit. The "@container": "@graph" is something introduced in the CG work, basically to handle the Verifiable Claims issue.

iherman commented 6 years ago

Ah, m’y bad. sorry about that.

I still feel a bit uncomfortable, but less. It somehow more than a “simple” container: it is not the property refers to a graph, but it refers to a graph that has a special additional behavior. This is different than, say, a container to a list, whch is just that: refers to a list, without any further strings attached. These mini, baroque additional thingies may bite us later because it will contribute to the overall impression of a very complex language.

ericprud commented 6 years ago

This proposal is specifically intended to address the increased complexity that arises when creating disconnected nodes in a graph (see the first TRIG example in my write-up above).

My guess is that some folks may want to motivate more controls in the @context; what we're working on here is the default behavior in the absence of extra directives for synthesizing URLs when there's no @id in the nested graph.

In Linked Data, there's a sharp divide between the purists who want to separate graph names from the nodes for which those graphs were created and pragmatists who don't see the need.

Purist: <http://…MyFoafPage> has an RDF node <http://…MyFoafPage#me> which stands for me;
Pragmatist: The Uniprot page <P04637> has a Turtle representation which states that <P04637> rdf:type up:Protein.

In the pragmatist approach, <P04637> identifies both the page and the protein. In the purist approach, a trick of HTTP (that the fragment identifier is not passed in an HTTP request) allows us to map from the node <http://…MyFoafPage#me> to the page <http://…MyFoafPage> (though not the other way around).

Because there's no analogous trick for bnodes (implied by objects with no @id), we are stuck with adding extra triples. Above, I described two approaches which synthesize extra URLs, one in the referring graph (root connector) and one in the referenced graph (root annotation). The root connector approach could have lots of forms. I mocked one up where the referring predicate got a sibling triple pointing to the root node:

_:b0 foo:input _:b1 .
_:b0 rdf:rootNode _:b2 . # <-- root connector
_:b1 { _:b2 foo:bar "a" … }

A perhaps more forward-thinking approach would be to create a structure for pairing roots with graphs:

_:b0 foo:input _:b1 .
_:b1 rdf:graphReference _:b2 . # <-- identify the created graph
_:b1 rdf:rootNode _:b3 . # <-- identify the node implied by the nested JSON object
_:b2 { _:b3 foo:bar "a" … }

A pairing of a graph name and a root or focus node would be helpful in other contexts e.g. identifying the pair of created web resource and RDF node in the return from a POST to an ldp:Container. Addressing this would also stanch the "I can't believe you guys haven't already solved this" comments I hear in HL7 when trying to use RDF to represent clinical resources.

azaroth42 commented 6 years ago

@ericprud Will you be at TPAC on the Thursday/Friday? We could dive into the details in person?

ericprud commented 6 years ago

I believe I can be there Thu. And yeah, I guess this could benefit from some whiteboard time.

simonstey commented 6 years ago

EXAMPLE 85: Implicitly named graph :

{
  "@context": {
    "@version": 1.1,
    "generatedAt": {
      "@id": "http://www.w3.org/ns/prov#generatedAtTime",
      "@type": "http://www.w3.org/2001/XMLSchema#date"
    },
    "Person": "http://xmlns.com/foaf/0.1/Person",
    "name": "http://xmlns.com/foaf/0.1/name",
    "knows": {"@id": "http://xmlns.com/foaf/0.1/knows", "@type": "@id"},
    "claim": {
     "@id": "https://w3id.org/credentials#claim",
      "@container": "@graph"
    }
  },
  "@id": "http://example.org/foaf-graph",
  "generatedAt": "2012-04-09",
  "claim": [
    {
      "@id": "http://manu.sporny.org/about#manu",
      "@type": "Person",
      "name": "Manu Sporny",
      "knows": "http://greggkellogg.net/foaf#me"
    }, {
      "@id": "http://greggkellogg.net/foaf#me",
      "@type": "Person",
      "name": "Gregg Kellogg",
      "knows": "http://manu.sporny.org/about#manu"
    }
  ]
}

provides following expanded version of itself:

[{
  "@id": "http://example.org/foaf-graph",
  "http://www.w3.org/ns/prov#generatedAtTime": [{
    "@value": "2012-04-09",
    "@type": "http://www.w3.org/2001/XMLSchema#date"
  }],
  "https://w3id.org/credentials#claim": [{
    "@graph": [{
      "@id": "http://manu.sporny.org/about#manu",
      "@type": ["http://xmlns.com/foaf/0.1/Person"],
      "http://xmlns.com/foaf/0.1/name": [{"@value": "Manu Sporny"}],
      "http://xmlns.com/foaf/0.1/knows": [
        {"@id": "http://greggkellogg.net/foaf#me"}
      ]}
    ]
  }, {
    "@graph": [{
      "@id": "http://greggkellogg.net/foaf#me",
      "@type": ["http://xmlns.com/foaf/0.1/Person"],
      "http://xmlns.com/foaf/0.1/name": [{"@value": "Gregg Kellogg"}],
      "http://xmlns.com/foaf/0.1/knows": [
        {"@id": "http://manu.sporny.org/about#manu"}
      ]
    }]
  }]
}]

and consequently shows two implicitly named graphs _:b0 and _:b1

Graph	Subject	Property	Value	Value Type
	http://example.org/foaf-graph	prov:generatedAtTime	2012-04-09	xsd:date
	http://example.org/foaf-graph	https://w3id.org/credentials#claim	_:b0
	http://example.org/foaf-graph	https://w3id.org/credentials#claim	_:b1
_:b0	http://manu.sporny.org/about#manu	rdf:type	foaf:Person
_:b0	http://manu.sporny.org/about#manu	foaf:name	Manu Sporny
_:b0	http://manu.sporny.org/about#manu	foaf:knows	http://greggkellogg.net/foaf#me
_:b1	http://greggkellogg.net/foaf#me	rdf:type	foaf:Person
_:b1	http://greggkellogg.net/foaf#me	foaf:name	Gregg Kellogg
_:b1	http://greggkellogg.net/foaf#me	foaf:knows	http://manu.sporny.org/about#manu

however, in the playground the expanded version of example 85 is given as:

[
  {
    "@id": "http://example.org/foaf-graph",
    "https://w3id.org/credentials#claim": [
      {
        "@graph": [
          {
            "@id": "http://manu.sporny.org/about#manu",
            "@type": [
              "http://xmlns.com/foaf/0.1/Person"
            ],
            "http://xmlns.com/foaf/0.1/knows": [
              {
                "@id": "http://greggkellogg.net/foaf#me"
              }
            ],
            "http://xmlns.com/foaf/0.1/name": [
              {
                "@value": "Manu Sporny"
              }
            ]
          },
          {
            "@id": "http://greggkellogg.net/foaf#me",
            "@type": [
              "http://xmlns.com/foaf/0.1/Person"
            ],
            "http://xmlns.com/foaf/0.1/knows": [
              {
                "@id": "http://manu.sporny.org/about#manu"
              }
            ],
            "http://xmlns.com/foaf/0.1/name": [
              {
                "@value": "Gregg Kellogg"
              }
            ]
          }
        ]
      }
    ],
    "http://www.w3.org/ns/prov#generatedAtTime": [
      {
        "@type": "http://www.w3.org/2001/XMLSchema#date",
        "@value": "2012-04-09"
      }
    ]
  }
]

and subsequently following nquads with one implicit graph _:b0 only:

<http://example.org/foaf-graph> <http://www.w3.org/ns/prov#generatedAtTime> "2012-04-09"^^<http://www.w3.org/2001/XMLSchema#date> .
<http://example.org/foaf-graph> <https://w3id.org/credentials#claim> _:b0 .
<http://greggkellogg.net/foaf#me> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> _:b0 .
<http://greggkellogg.net/foaf#me> <http://xmlns.com/foaf/0.1/knows> <http://manu.sporny.org/about#manu> _:b0 .
<http://greggkellogg.net/foaf#me> <http://xmlns.com/foaf/0.1/name> "Gregg Kellogg" _:b0 .
<http://manu.sporny.org/about#manu> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> _:b0 .
<http://manu.sporny.org/about#manu> <http://xmlns.com/foaf/0.1/knows> <http://greggkellogg.net/foaf#me> _:b0 .
<http://manu.sporny.org/about#manu> <http://xmlns.com/foaf/0.1/name> "Manu Sporny" _:b0 .

Why is the playground not producing the same result as given in the spec? or am I missing something here and they are actually equivalent?

iherman commented 6 years ago

This issue was discussed in a meeting.

RESOLVED: add a feature at risk that the implicitly identified graphs will share the bnode with the unidentified member of the graph, on the grounds that the user community most in need of this would expect it, and the community that would be horrified by it better understands the solution of explicit naming
View the transcript
Benjamin Young: white board from this morning: white board content
Eric Prud’hommeaux: the issue to talk about api#26 …
Eric Prud’hommeaux: … Gregg wrote it up. The issue is for @container: @graph — this property creates a blank node that ends up being a graph name for the embedded triples.
… so if we look at something without @graph, and we have a tree like […], we could access it like var.input.value
… In triples it turns into a graph b0 foo:input b1 ; b1 foo:value "x"
… We could get that with a graph path similar to the code
… Which is the same as the sparql ?a foo:input ?b ; ?b foo:value ?v
… In my proposal, If you create a graph it still ends up traversable
… but currently we end up with b0 input b1 ; b2:value "x" — it’s disconnected
… The things we normally use to get around don’t work
Adam Soroka: We need to make the bnodes the same?
Eric Prud’hommeaux: Yes. We would otherwise need higher level logic to merge them
… but we could just re-use the bnode
… this was used for verifiable claims. The bit that was in the input was only a tiny bit of stuff. So warlking around in a large graph looking for a triple was easy as the wrapper was very small, but that isn’t always the case
… you would also otherwise need application level logic, or you can’t query
… to traverse the data, you need to know it’s valid
Adam Soroka: And hence ShEX issues
Eric Prud’hommeaux: Yes, or SPARQL queries.
Eric Prud’hommeaux: The downside is that the blank node is the name of the graph, and a node in the outer graph
… this could be the default behavior. There might be other ways to connect them in the future
… need to get the same level of access in the RDF as in the JSON
… one more thing, the way to connect the nodes with more configuration and more of a pain, one construct is to say that b1 has a focus of b2
… that has a lot of reuse, but would mean writing into the rdf namespace
… e.g. in clinical data. Here’s a pairing of a graph name and a focus node
Eric Prud’hommeaux: Objection is typically that if there’s a property, is it about the graph or about the node
Gregg Kellogg: implies the name of the graph has meaning
Eric Prud’hommeaux: Yes, but it’s a blank node
… but what happens if it has @id? I think the answer there is both a node name and a graph name
Rob Sanderson: What’s the range of foo:input?
Eric Prud’hommeaux: union of named graph and the graph node class
… which doesn’t bother logicians, but does bug engineers
Gregg Kellogg: Not the use of every graph container
… where the graph appears with the value of a statement and doesn’t have their own declarative statement
… should we just use the graph name as the subject in that case
… which would allow for the follow your nose
Eric Prud’hommeaux: everything inside the input, all of those triples have the same subject
Adam Soroka: Trying to think of situations where I would want either way
Gregg Kellogg: You could just declare the subject
Benjamin Young: That maps cleanly to the expection when used — once typed out I expect it to work like that
… to end up detached would be bad.
… it breaks round tripping.
… there’s times when you use @graph like a packaging format
… here’s a bundle of stuff
Adam Soroka: as Gregg says in that case put in an explicit subject
Benjamin Young: In @container: @graph the implication is that they’re connected
… once you lose that, you’d just stay in JSON
Rob Sanderson: so the proposal is that for @container: @graph, when the subject is not explicitly set, then the default is to reuse the blank node
Ivan Herman: I have a more general uncomfortable feeling. We introduce another micro-rule. They all make sense by themselves, but when you pile them up you get a language that’s diifficult to understand
… we don’t take a simpler approach
Eric Prud’hommeaux: It seems the argument is more persuasive in the other direction
Ivan Herman: I don’t go into the particular issue, but that we just got as a proposal — if this and that and that, then …
… this is the proposal, it’s not a straightforward thing
… this is what I don’t like
… we pile up lots of these things and end up very complex.
… We should talk about URI resolution. e.g. with vocab and this and that. 90% of the people on the call didn’t follow what Gregg was explaining
… not anything wrong with what Gregg says, the reasoning steps are all okay by themselves, it’s the overall thing that becomes complicated
… the containers in 1.0 were used only for one thing, now we add a lot more
… we continue to do that. Don’t want to get to the technical details for this issue, just at the overall pattern
Eric Prud’hommeaux: I think the argument here is that the status quo is harder to explain, more surprising
Benjamin Young: Current situation is accidental.
… this proposal seems more natural
Adam Soroka: there’s some tension between avoiding surprise and keeping things easy to learn. In this case there is complexity, but its less surprising
Ivan Herman: As a zero level question, why do we need the container graph?
… Some community needs something, so we add a new quirk
Benjamin Young: @graph gets everyone’s hackles up. So we got @container: @graph
… from a JSON developer’s perspective, they need tools to get from the tree to the graph structure
… need to not annoy both groups
… this one to me resolves an issue from the RDF side
Benjamin Young: https://w3c.github.io/json-ld-syntax/#ex-85-implicitly-named-graph
Simon Steyskal: Going through the spec, in example 85, for graph containers. I wasn’t sure about the original sample
… it shows two graph objects, but if you look at the statements in the playground, it’s not what the original version has. They don’t match the expanded version in the spec
… the playground already does this reuse
Adam Soroka: People might be reliant on the feature?
Rob Sanderson: what does the spec say now?
Gregg Kellogg: it makes you create a new blank node
Ivan Herman: I repeat what I said in the issue comments :-( From a point of view of consistent view of how JSON-LD behaves, what is done today is the right thing to do
… the various things that a container contains is pieces of graphs. Inside is different nodes. So reusing the same bnode internally and for the graph, I understand it’s handy, but it does not fit the model for the JSON-LD world
… we could hack it around with a micro-rule, but from a JSON-LD consistency PoV it’s not right
… not a formal objection but I disagree
Adam Soroka: There’s a lot of opportunities — there’s other ways to do it
Eric Prud’hommeaux: If you could parameterize the behavior and let the user decide whether they get this behavior or the other
Ivan Herman: We have the syntax in the example. We can name the bnode explicitly
Gregg Kellogg: what does container: graph mean? You’re putting a box around some of the information so that it’s part of a separate graph.
… but it means that input has a value that is a named graph.
Ivan Herman: That’s what it means
Adam Soroka: You’re right, but eric is not asking to uniformly conflate them
Ivan Herman: then a separate syntax?
… we would have two types of containers, one graph container behaves as it should, and another that does something extra
… it pulls in the name into the internals of the graph
Adam Soroka: It’s a very common idiom
Ivan Herman: I understand. The problem is that we always follow perfectly valid rules, but we need to look at the overall result
… for many people JSON-LD is very scary because it’s so complicated
Eric Prud’hommeaux: But has better adoption
Ivan Herman: we do something very strange — and maybe we need to acknowledge it — we work with people from all corners of JSON usage and try to push them into the linked data world
… so you might lose the LD people as JSON-LD becomes an incredible mess
… you have people working with patterns of usage, but if I come from the LD world and just want to use JSON-LD as a serialization, and I know what I’m doing, then for me the usage is very complicated
… I don’t think in the fixed patterns, I just want to put a graph and get unexpected results
Eric Prud’hommeaux: In this case you have a disconnected graph, and you can do that with the expanded form
Rob Sanderson: [… more similar discussion …]
Rob Sanderson: So the alternate would be to have an explicit link. Would that be automatic, or put into the data?
Gregg Kellogg: I don’t think you can have the named graph in the source and the graph?
… within the named graph you have a triple whose subject is the graph
… can create a statement with a blank node, and the meaning lies with the predicate
… would not want to automatically introduce it
Eric Prud’hommeaux: When you have a graph that has two of those, what does it mean?
… it’s unattractive, and I’ve done it but was an interim measure
Gregg Kellogg: if we keep the status quo, and the name of the graph is not visible, the implications for writing a shape for trying to match things
Eric Prud’hommeaux: THere’s a step where you collect things. But when you get to the internal graph you’ve already collected it. You’d need to do cycles of gathering and validating
… without some predictable connection, there isn’t a way to do it without a procedural language
Adam Soroka: You have to assume validity in order to validate it
Eric Prud’hommeaux: Can do various things
Adam Soroka: But it’s application level knowledge
Eric Prud’hommeaux: Yes, you’d have to customize a lot of stuff
… here you do an unbound sparql query
… which is considerably more expensive
Adam Soroka: you still might need to apply application knowledge
Eric Prud’hommeaux: Could find nodes that don’t have inbound links, but you can’t assume that’s always the case with inverse properties etc
Rob Sanderson: (restates problem)
Ivan Herman: It comes from @container: @graph. As a value, I have an object whose keys refer to something specific
… a language refers to the language of the string
… but the graphs are very different beasts. They must have an identifier, or we generate them one
… it’s different from language, so maybe the container model is not fit for that purpose
… we must talk about identifiers and how they’re used elsewhere
… when I use the container for a language, it’s simple
… it’s by the natural language
… container is a way to categorize certain things, and they become keys
… I have a bunch of strings with a category, the language
… I create an object that uses the category as a term
Adam Soroka: Containers as maps
Gregg Kellogg: Not all though
Ivan Herman: Yes, @container: @list is a very different animal
Eric Prud’hommeaux: Not that it’s a graph, just the container
… if you want to build a named graph then have a different construct
Gregg Kellogg: Can have a map of graphs with @id
… an array of keys
Ivan Herman: should take a step back to look at containers and mapping
… is it possible to have a clearer model and separate the two things
… and then come back to it if there’s a more natural way to model it
… if it’s a blank node, then I can assign and reuse
… user has the choice to reuse.
Gregg Kellogg: If you use a graph id map, then they have to name them explicitly
… VC and WoT are in a similar situation, I think
Ivan Herman: Yes, but do we now add another special quirk??
Adam Soroka: Depends how many people are interested in it
Ivan Herman: Then we need a template language
… Propose to leave this alone for a little and look at containers in general
Adam Soroka: And “path” is in here too now (see discussions with WoT), for things people are asking for
Ivan Herman: Could use different term, they’re not containers like list or set
Gregg Kellogg: Could introduce @map
Ivan Herman: and then add in obsolete terms for indexing
Adam Soroka: Seeing patterns, and then clarifying how to get them into the syntax
… hence microrules
Gregg Kellogg: Also about the interpretation of the value space
Rob Sanderson: Exactly equivalent to https://github.com/w3c/json-ld-syntax/issues/77
Gregg Kellogg: raising warnings :( and makes algorithms harder
Rob Sanderson: priority of constituencies puts algorithms very close to the bottom
Ivan Herman: Can have a raise warning or not flag in the API
… algorithms will be slightly more complicated, but only affect 5 or 6 people
Adam Soroka: And we probably know most of them
Eric Prud’hommeaux: Regardless of how you construct the syntax, need to deal with nesting in JSON
Gregg Kellogg: There’s the expectation of connectivity
Eric Prud’hommeaux: Relatively simplisitic user, but that’s typical. If it’s more nuanced, I want the default to not produce pathological graphs
Gregg Kellogg: If we created a new @map thing and put graph / id maps in there, so would have a reduced use case for @container, and we’re back to the same issue
… container is a graph, and you’re in an implicitly named graph. Now where are the rest of the things?
Adam Soroka: syntactic mechanism. If containers were minimized, could be nicer. If we could add metadata to containers, we could maybe add the information. But would need very strong notion of containers
Eric Prud’hommeaux: That’s a step in the right direction
… trying to deal with existing sem web … two camps. People who abuse the node to be the graph name. And then there’s people who keep them separate.
… but theres a mechanism to connect them
… trick is normally HTTP fragments. Use the # and then HTTP connects them
… those two camps are not going to come together
… at least half the people are going to be miffed
… so putting in controls will help
Ivan Herman: more inclined to look at something more complicated, but long term more powerful, and accept that we need a transformation / template language
Ivan Herman: We see a user community that uses a template as that’s how they think. We try to come up with syntactic quirks so the templates fit in the model
… that’s where we get in trouble. If we had some transformation language, it could help.
… not sure it’s realistic, and not familiar with framing details
… can that be added to framing model? Not a rec, so don’t have backward compatibility restriction
… if we do something there, that would mean a cleaner separation
… if this is taken up by a frame and uses the same bnode. Can express it in JSON-LD. It’s all doable already.
Eric Prud’hommeaux: have about 100 hours thinking on this in ShEX. Both dealing with a case where there’s an algorithmic mapping between a graph node and a node in the graph
… need to get from one to the other
… expressivity we discovered we needed was at a minimum to chop off or add a hash based identifier
… for the range 14 folks
… ability to say it’s the same
… and then as you work down into the people who have pipeline techniques, you end up with regexps
… that lets you use node identifiers that are relative to the base
… two nodes that are different but related
… regexs look at the graph labels
… to deal with existing data
… question is how much you want RDF data to drive this.
Adam Soroka: And the other extreme is JSON devs who are told they have to do something. Some things don’t make any sense at one or the other end of the spectrum
Eric’s examples: EricP's examples
Proposed resolution: add a feature at risk that the implicitly identified graphs will share the bnode with the unidentified member of the graph, on the grounds that the user community most in need of this would expect it, and the community that would be horrified by it better understands the solution of explicit naming (Rob Sanderson)
Rob Sanderson: +1
Ivan Herman: +0.0000001
Simon Steyskal: +1
Gregg Kellogg: +1
Harold Solbrig: +1
Resolution #1: add a feature at risk that the implicitly identified graphs will share the bnode with the unidentified member of the graph, on the grounds that the user community most in need of this would expect it, and the community that would be horrified by it better understands the solution of explicit naming
Rob Sanderson: Assuming +1s from Adam and Benjamin
Adam Soroka: +1

dlongley commented 6 years ago

I'm a little lost on the conclusion here...

Currently, this (http://tinyurl.com/y7zoyjk2):

{
  "@context": {
    "@version": 1.1,
    "claim": {"@id": "ex:claim", "@container": "@graph"},
    "name": "ex:name"
  },
  "claim": {
    "@id": "ex:subject",
    "name": "A subject"
  }
}

Yields these quads:

<ex:subject> <ex:name> "A subject" _:b1 .
_:b0 <ex:claim> _:b1 .

Would this change with this proposal? If so, how?

I'm concerned that there may be a serious issue that breaks the encapsulation properties we need for Verifiable Credentials.

gkellogg commented 6 years ago

No, it wouldn't change the quads in this case. If the claim didn't have an @id, it would reuse that of the graph. The reasoning is that this will make it easier to follow through the graph to the default subject for shape matching purposes. If the document were the following:

{
  "@context": {
    "@version": 1.1,
    "claim": {"@id": "ex:claim", "@container": "@graph"},
    "name": "ex:name"
  },
  "claim": {
    "name": "A subject"
  }
}

Then you'd see something like:

_:b0 <ex:claim> _:b1 .
_:b1 <ex:name> "A subject" _b1 .

dlongley commented 6 years ago

@gkellogg,

Oh! That's much less scary than I thought. I think that's ok, but would love for others who have any experience with the VC work to give their opinions. Once we've modeled more of the ZKP style approach to VCs (where the main "subject" of a VC may not have an @id) we may have more input.

gkellogg commented 5 years ago

The discussion on Framing blank node unnamed graphs was actually about w3c/json-ld-syntax#26. w3c/json-ld-framing#27 is really about framing anonymous named graphs, which we didn’t discuss.

Since this was the body of the discussion, I’d just suggest changing the title for 5.11 to "Ensure that blank node identifiers for anonymous graphs are reused”, and reference w3c/json-ld-syntax#26. instead, but we probably need to agree to this on next Friday’s call.

gkellogg commented 5 years ago

This issue was discussed in a meeting.

RESOLVED: close syntax#27 wontfix, as there’s no justification for the required RDF layer requirement that the blank node identity of the named graph is the default subject of the triples in the graph {: #resolution15 .resolution}
View the transcript
Framing blank node unnamed graphs
Rob Sanderson: ref: https://github.com/w3c/json-ld-framing/issues/27
Gregg Kellogg: how can SHeX validate verifiable claims?
… there was no reasonable way for SHeX to figure out where to start in that graph to begin validation. Why not just reuse the blank node as the default subject of the graph?
Ivan Herman: I remember, and I am opposed to this.
Rob Sanderson: if it was not a blank node, does the problem go away?
Gregg Kellogg: if it had an identity, it wouldn’t get to this point.
Gregg Kellogg: if you use a graph container, should we use the blank node as the default subject for the graph?
Ivan Herman: that’s semantically wrong.
Rob Sanderson: is this a RDF problem?
Ivan Herman: no. A blank node for the graph and a blank node within the graph are two different things.
Gregg Kellogg: JSON people have a tree-based view, and graphs are not required to have a root.
… so it’s not unreasonable to add a property to indicate in the root.
Gregg Kellogg: this is used in framing, where the top node has a id
Harold Solbrig: I object to bnode, because if there’s not a stake in the ground, having magic to b-nodes…
Gregg Kellogg: fragment identifiers would be a better solution.
Proposed resolution: close syntax#27 wontfix, as there’s no justification for the required RDF layer requirement that the blank node identity of the named graph is the default subject of the triples in the graph (Rob Sanderson)
Gregg Kellogg: +1
Rob Sanderson: +1
David Newbury: +1
Ivan Herman: +1
Harold Solbrig: +1!
David I. Lehn: +1
Jeff Mixter: +1
Adam Soroka: +0
Resolution #15: close syntax#27 wontfix, as there’s no justification for the required RDF layer requirement that the blank node identity of the named graph is the default subject of the triples in the graph {: #resolution15 .resolution}

ericprud commented 5 years ago

"no justification"‽ Do I have to go back over the arguments that convinced everyone in the room except Ivan during the F2F?

gkellogg commented 5 years ago

Really, it just came down to the stink test. Overloading the use of a blank node name as the graph name and the default subject was generally regarded as being semantically incorrect, even if useful. Authors will need to find another way, such as a well-known property value, or use fragment identifiers.

azaroth42 commented 5 years ago

Hi @ericprud,

To clarify the resolution, it was not that the use case was considered to be invalid, and was even generally agreed to be useful as @gkellogg says! However it was considered that the JSON-LD group did not have the justification to make a significant assertion about the use of named graphs and blank nodes, such that it became a de facto semantic model requirement that isn't in RDF 1.1. Given our charter (that says we will kick RDF problems up to a larger group), we couldn't justify making some normative requirement in this space, especially as the issue goes away if a URI is used, or if a property is added for an application to find the top node of the named graph.

ericprud commented 5 years ago

I spoke to the director about my concerns regarding the disconnectedness of the graph. He urged us to pursue a solution along the lines of having an extra triple which indicates the node in the unnamed graph which corresponds to the nested JSON tree, i.e. root connector proposal above. He said we could use a JSON-LD namespace for the connector property and consider moving it to the RDF namespace once the technical issues were resolved.

swickr commented 5 years ago

@ericprud, presuming your comment refers to the conversation you and I had last week then clarification is needed. You raised a concern with @plehegar on how the Working Group had handled this issue. PLH asked me to take a look and see if, in our delegation from TimBL to handle Transition Requests, we might find a path that would save TimBL's time later.

You and I spoke about the evolution of this thread following the Lyon f2f; including whether it was clear to you what actions could result in the removal of the "At Risk" qualification.

I spoke to the director about my concerns regarding the disconnectedness of the graph. He urged us to pursue a solution

As you acknowledged that you had provided little (I heard you say "no") activity on addressing the At Risk concerns since Lyon, I suggested (ok; "urged") that you consider a solution that applications in your use case(s) could employ now without changing the spec, or that the WG might be more comfortable accepting as an interim for the next Recommendation. I noted that the Working Group has a schedule that it is expected to meet and it has the responsibility to triage its issues accordingly.

along the lines of having an extra triple which indicates the node in the unnamed graph which corresponds to the nested JSON tree, i.e. root connector proposal above. He said we could use a JSON-LD namespace for the connector property and consider moving it to the RDF namespace once the technical issues were resolved.

I said that if the Working Group decides to consider that alternative approach further and reaches consensus on including an experimental/interim approach in the spec with such a triple and their remaining concern was which namespace to use, that IMHO there could be flexibility on choice of namespace.

w3c / json-ld-api

Ensure that blank node identifiers for anonymous graphs are reused #26