levelgraph / levelgraph-jsonld

The Object Document Mapper for LevelGraph based on JSON-LD
113 stars 16 forks source link

clarify handling of blank nodes #8

Open elf-pavlik opened 10 years ago

elf-pavlik commented 10 years ago

NOTE: this issue started with focuss on embedding but then drifted into blank nodes :confounded:

http://json-ld.org/spec/latest/json-ld/#embedding

i tried adding embedded nodes to one of the fixtures

{
    "@context": {
        "@vocab": "http://xmlns.com/foaf/0.1/",
        "homepage": {
            "@type": "@id"
        }
    },
    "@id": "http://manu.sporny.org#person",
    "name": "Manu Sporny",
    "homepage": "http://manu.sporny.org/",
    "knows": [
        {
            "@id": "http://gregkellog.com",
            "name": "Greg Kellog"
        },
        {
            "@id": "http://markus-lanthaler.com",
            "name": "Markus Lanthaler"
        }
    ]
}

and then run test

db.jsonld.put(manu, function() {
  db.jsonld.del(manu["@id"], function() {
    db.get({}, function(err, triples) {
      // getting the full db
      expect(triples).to.be.empty;
      done();
    });
  });
});

it breaks since we don't delete embeded nodes , it looks like we manage it already in correct way but may just need to document better and test for desired behavior!

when we use blank nodes, they get deleted which makes sense

{
    "@context": {
        "@vocab": "http://xmlns.com/foaf/0.1/",
        "homepage": {
            "@type": "@id"
        }
    },
    "@id": "http://manu.sporny.org#person",
    "name": "Manu Sporny",
    "homepage": "http://manu.sporny.org/",
    "knows": [
        {
            "name": "Greg Kellog"
        },
        {
            "name": "Markus Lanthaler"
        }
    ]
}
elf-pavlik commented 10 years ago

i think about adding option to get() which would allow two ways of loading embeded nodes:

or if we use integer we could specify how many levels we would like to load eagerly

mcollina commented 10 years ago

You can control it in a much more granular way. As of now, it works loading exactly the passed context, but there might be bugs.

elf-pavlik commented 10 years ago

@mcollina i have impression that we may confuse @contextand frame in https://github.com/mcollina/levelgraph-jsonld/blame/master/README.md#L109

Framing - http://json-ld.org/spec/latest/json-ld-framing/#framing

Framing makes use of the Node Map Generation algorithm to place each object defined in the JSON-LD document into a flat list of objects, allowing them to be operated upon by the framing algorithm.

The Context - http://json-ld.org/spec/latest/json-ld/#the-context

Simply speaking, a context is used to map terms to IRIs. Terms are case sensitive and any valid string that is not a reserved JSON-LD keyword can be used as a term.

so i guess it connects to #2 and gets quite funky with put and delete :)

mcollina commented 10 years ago

I see that there is plenty of confusion in here :(. What do you propose? This library is quite 'edgy' and we might change it radically, if you think it's the right thing to do.

elf-pavlik commented 10 years ago

I need to get more hands on experience with using graph store in real world cases! I may work in next days on apps/daemons related to: http://pad.hackers4peace.net/p/open-wishlist + possibly hypering SpaceAPI

I plan to heavily use JSON-LD including schema.org (GoodRelations), Hydra and possibly Payswarm so that we can work with practical real world use cases :smile:

For blank nodes it gets very interesting here as we also discuss in https://github.com/mcollina/levelgraph/issues/43 see rdf-identifiers article i linked also there plus my comments on storing nodes. How would you call what we do now with generating UUIDs? materialized blank nodes... :wink:

@bendiken maybe when we meet in Berlin you could tell me more about storing blank nodes, especially deserialized from JSON-LD document, and possibly using Generalized RDF Triples, Graphs, and Datasets + Notation3 Paths / LevelGraph Navigator API @RubenVerborgh

maybe also @lanthaler @msporny @gkellogg you could share with us a little of you experience with persisting graphs from deserialized JSON-LD which uses blank nodes heavily?

msporny commented 10 years ago

@elf-pavlik we use blank nodes heavily in the Web Payments / PaySwarm work, but store them in a JSON document store, like MongoDB or CouchDB. We've never had an issue w/ persisting blank nodes wrt. JSON-LD and MongoDB and we've had quite a bit of complex use cases using blank nodes in the Web Payments work. My suggestion would be to not skolemize unless you absolutely need to, and we've never had a reason to skolemize our data in JSON-LD.

mcollina commented 10 years ago

@msporny but in that case you lose the ability to search them using graph patterns, which is the whole point of this library. The whole idea of this library it to convert a JSON-LD in RDF and save it (and viceversa). In order to load it back and support updating and so on, we need to 'materialize' the blank id, so it is persistent. It is very similar to what jena does if you store blank nodes in it.

elf-pavlik commented 10 years ago

Thank guys for contributing to clarifying this topic little further!

I hope @bendiken can help us with some of his experience in building Dydra! (unlicensed but secret :wink: ). Maybe also @kidehen can offer some generous suggestions? :calling:

lanthaler commented 10 years ago

On Tuesday, December 10, 2013 4:03 PM, ☮ elf Pavlik ☮ wrote:

maybe also @lanthaler @msporny @gkellogg you could share with us a little of you experience with persisting graphs from deserialized JSON-LD which uses blank nodes heavily?

I can't add much that hasn't been already said by Manu. The advantage of JSON-LD is that you can easily store it in MongoDB or ElasticSearch. Other than that, I often simply deserialize it into an object graph which is than taken care of by my ORM. If you convert it to triples, it doesn't matter anymore that it was JSON-LD before. Your experience is the same regardless which serialization format was used.

I'm actually not sure I understand what the problem is here because I don't fully understand the intention of the method db.jsonld.del. It's docu says "In order to delete an object, you can just pass it's '@id' to the '@del' method" but there are no "objects" in this sense in RDF. I think what is meant is that all triples whose subject corresponds to the value of '@id' is dropped!? If so, why would you expect that the nested "objects" are deleted as well?

gkellogg commented 10 years ago

Representing Blank Nodes in a Triple Store (or Graph database) really shouldn't be an issue. It's entirely reasonable to deserialize JSON-LD, including Blank Nodes, into a Triple Store such as Dydra. Graph pattern searches don't rely on having well-known blank nodes; in search patterns, blank nodes look like existential variables.

Of course, a Triple Store implementation will use some internal identifier, such as a UUID, to represent the node internally, but that is not exposed unless explicitly skolumizing. However, turning blank nodes into store-specific IRIs removes much of the flexibility that blank nodes provide.

JSON-LD is great because it can be stored and processed just like any other JSON document, say in MongoDB, but it also is based on the RDF Graph model so that it's entirely consistent with RDF Triple Stores. Indeed, much processing involves "flattening" a JSON-LD document before, say framing it. This is really equivalent to serializing to RDF and de-serializing back trom RDF.

elf-pavlik commented 10 years ago

@lanthaler with db.jsonld.del() it seems to work in meaningful way at this moment. If nested(nested) nodes have permanent IRIs they don't get deleted, but nested blank nodes currently do get deleted as well...

Reading SPARQL 1.1 Uniform HTTP Protocol for Managing RDF Graphs I see possibly different behavior for HTTP PUT and HTTP POST. PUT deletes node and creates new one while POST seems to merge data? To stay honest I feel quite confused how to implement those two modes of insertion into database. If we don't delete node on HTTP POST then we would possibly end up adding a copy of all blank nodes each time same document gets POSTed.

{
    "@context": {
        "@vocab": "http://xmlns.com/foaf/0.1/",
    },
    "@id": "http://manu.sporny.org#person",
    "firstName": "Manu ",
    "knows": [{ "firstName": "Greg" },
              { "firstName": "Markus" }
    ]
}

In this example Manu knows someone name Greg but we can't exactly identify him. Inserting such data again without deleting all the old data before would create two friends name Greg and two name Markus, which most developers may find surprising if they would try to count Manu's friends...

I see topic of storing deserialized RDF graph more general than JSON-LD serialization but at the same time it looks like proposed by you introduction of blank node identifiers for predicates and graph names might add significant complexity to persisting those graphs as graphs (which one can traverse/pattern match), not as documents as I understand @msporny does with MongoDB/CouchDB. Again I hope that we can get some feedback from people who actually implement graph stores!

lanthaler commented 10 years ago

@lanthaler with db.jsonld.del() it seems to work in meaningful way at this moment. If nested(nested) nodes have permanent IRIs they don't get deleted, but nested blank nodes currently do get deleted as well...

Just unlabeled blank nodes or also labeled blank nodes (i.e., nodes with "@id": "_:xyz..")?

Reading SPARQL 1.1 Uniform HTTP Protocol for Managing RDF Graphs I see possibly different behavior for HTTP PUT and HTTP POST. PUT deletes node and creates new one while POST seems to merge data? To stay honest I feel quite confused how to implement those two modes of insertion into database. If we don't delete node on HTTP POST then we would possibly end up adding a copy of all blank nodes each time same document gets POSTed.

Right.. Since you just tell the server there's something it can figure whether that something is the same as the other thing it already knows about or not. Consequently, the only sensible thing it can do (without any other out of band knowledge) is to add a separate node.

I see topic of storing deserialized RDF graph more general than JSON-LD serialization but at the same time it looks like proposed by you introduction of blank node identifiers for predicates and graph names might add significant complexity to persisting those graphs as graphs (which one can traverse/pattern match), not as documents as I understand @msporny does with MongoDB/CouchDB. Again I hope that we can get some feedback from people who actually implement graph stores!

Maybe it would help if you could explain what you are trying to achieve. What's the concrete use case you need to solve?

elf-pavlik commented 10 years ago

useful references to Linked Data Platform 1.0 spec related to deleting resources:

just so we can keep in mind what kind of operations some of the servers using this store may want to support, to my understanding LDP Cointainer simply embeds LDP Resources, in this case all of them should have IRI. "In many – perhaps most – applications involving containers, it is desirable for the client to be able to get information about each container member without having to do a GET on each one. LDPC allows servers to include this information directly in the representation of the container. "

i would also like to keep track on design of http://hydra-cg.com/spec/latest/core/#collections "... member items can either consist of solely a link or also include some properties. In some cases embedding member properties directly in the collection is beneficial as it may reduce the number of HTTP requests necessary to get enough information to process the result."

@lanthaler i assume that hydra:Collection will never have a member without @id ?

{
  "@context": "http://www.w3.org/ns/hydra/context.jsonld",
  "@id": "http://www.markus-lanthaler.com/hydra/api-demo/users/",
  "@type": "Collection",
  "members": [
    {
      "name": "Peter Pan"
    },
    ...
  ]
}

@msporny could you please point us to some concrete, real world use cases with examples where you heavily use blank nodes? especially for predicates and graph names

elf-pavlik commented 10 years ago

IMO bit confusing sentence in http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/#section-blank-nodes

RDF makes no reference to any internal structure of blank nodes. Given two blank nodes, it is possible to determine whether or not they are the same.

with some more explaination in http://stackoverflow.com/questions/6667655/how-to-distinguish-between-two-blank-nodes-in-rdf

it looks different now in recent RDF 1.1 draft http://www.w3.org/TR/2013/CR-rdf11-concepts-20131105/#section-Graph-Literal

Blank node identifiers are local identifiers that are used in some concrete RDF syntaxes or RDF store implementations. They are always locally scoped to the file or RDF store, and are not persistent or portable identifiers for blank nodes. Blank node identifiers are not part of the RDF abstract syntax, but are entirely dependent on the concrete syntax or implementation. The syntactic restrictions on blank node identifiers, if any, therefore also depend on the concrete RDF syntax or implementation. Implementations that handle blank node identifiers in concrete syntaxes need to be careful not to create the same blank node from multiple occurences of the same blank node identifier except in situations where this is supported by the syntax.

@mcollina maybe we should NOT include auto-generated UUID blank node identifiers in results on put? just as you delete them in this test https://github.com/mcollina/levelgraph-jsonld/blob/master/test/get_spec.js#L62-L63

then we need to watch out that we might create duplicate blank nodes if we put same document multiple times... http://manu.sporny.org/2013/rdf-identifiers/#comment-3348

lanthaler commented 10 years ago

i would also like to keep track on design of http://hydra-cg.com/spec/latest/core/#collections ... @lanthaler i assume that hydra:Collection will never have a member without @id ?

In typical usage scenarios you would always include an @id but there's of course no guarantee. In some cases it may indeed makes sense to use hydra:Collection with blank node members

@msporny could you please point us to some concrete, real world use cases with examples where you heavily use blank nodes?

Most schema.org data out there heavily uses blank nodes.

especially for predicates and graph names

Blank node predicates are useful when you need to convert JSON to RDF but can't map all properties to stable predicates or if you have to make up predicates on the fly (describing them in terms of other predicates using OWL for example). Blank node graph names are useful for transient messages, especially from clients to servers.

elf-pavlik commented 10 years ago

@lanthaler thank you for you comments, they come very helpful!

if you have by any chance somewhere at hand links to examples used in production of data marked with schema.org 'out there', as well as those cases you describe with blank nodes as predicates and graph names i would appreciate if you paste them here. otherwise please don't bother searching since i can also take time to dig around myself :pig_nose:

lanthaler commented 10 years ago

if you have by any chance somewhere at hand links to examples used in production of data marked with schema.org 'out there'

The most obvious examples live directly at the source http://schema.org/Action (full of bnodes) :-)

as well as those cases you describe with blank nodes as predicates and graph names i would appreciate if you paste them here. otherwise please don't bother searching since i can also take time to dig around myself

The only link I have at hand for bnode properties is http://sgillies.net/blog/1179/dumpgj-json-ld-and-crs/

HTH

jmatsushita commented 7 years ago

Hi there, I'm bumping this old thread in case some of the participants would be willing to take a look at a proposal for handling deletion of blank nodes for levelgraph-jsonld at this issue #42.

Would really appreciate your thoughts!

gkellogg commented 7 years ago

There is an issue on JSON-LD 1.1 (json-ld/json-ld.org#293) to remove BNode labels that are created during the course of framing.