uf6 / design

Openly designing data enrichment solutions
http://en.wikipedia.org/wiki/Uranium_hexafluoride
28 stars 0 forks source link

Graph Exchange Formats #13

Open jmatsushita opened 9 years ago

jmatsushita commented 9 years ago

I’ve played with json transformations from the metamaps graph format to JSON-LD here https://gist.github.com/elf-pavlik/406ffe0300e399f72c11#comment-1408649 using JQ (a really powerful json data transformation DSL).

Thinking more about it, I’m not against CSV in general, but the main problem with network data is to have to deal with a sheet of nodes and a sheet of edges. It’s not easy to manage and read. In my opinion it’s the same thing for node + edges JSON formats which are best for constructing graphs with programs but are hard to understand and manage for people.

A equivalent of the format proposed by Elf in the Gist linked above applied to FOAF is available here https://github.com/mcollina/levelgraph-jsonld#searching-with-levelgraph

var manu = {
  "@context": {
    "@vocab": "http://xmlns.com/foaf/0.1/",
    "homepage": { "@type": "@id" },
    "knows": { "@type": "@id" },
    "based_near": { "@type": "@id" }
  },
  "@id": "http://manu.sporny.org#person",
  "name": "Manu Sporny",
  "homepage": "http://manu.sporny.org/",
  "knows": [{
    "@id": "https://my-profile.eu/people/deiu/card#me",
    "name": "Andrei Vlad Sambra",
    "based_near": "http://dbpedia.org/resource/Paris"
  }, {
    "@id": "http://melvincarvalho.com/#me",
    "name": "Melvin Carvalho",
    "based_near": "http://dbpedia.org/resource/Honolulu"
  }, {
    "@id": "http://bblfish.net/people/henry/card#me",
    "name": "Henry Story",
    "based_near": "http://dbpedia.org/resource/Paris"
  }, {
    "@id": "http://presbrey.mit.edu/foaf#presbrey",
    "name": "Joe Presbrey",
    "based_near": "http://dbpedia.org/resource/Cambridge"
  }]
};

Trying to limit some of the @noise:

var manu = {
  "name": "Manu Sporny",
  "homepage": "http://manu.sporny.org/",
  "knows": [{
    "name": "Andrei Vlad Sambra",
    "based_near": "http://dbpedia.org/resource/Paris"
  }, {
    "name": "Melvin Carvalho",
    "based_near": "http://dbpedia.org/resource/Honolulu"
  }, {
    "name": "Henry Story",
    "based_near": "http://dbpedia.org/resource/Paris"
  }, {
    "name": "Joe Presbrey",
    "based_near": "http://dbpedia.org/resource/Cambridge"
  }]
};

This approach reads more easily to me (the predicate “knows” is a property) and implies 3 “person” nodes and 3 edges between them of type “knows”. You could also want to see that “based_near” as actually another edge.

The interesting tricky thing is that in RDF everything is a node->edge->node(->context) thingy including properties (so even “homepage” here or “name” actually is an “edge” in RDF). Whereas in more classic graph approaches (for instance in neo4j) nodes have properties (so “homepage” and “name” are glued on the node), and then they have edges (like “knows” or “based_near”). So maybe the best of the two worlds[1] is to add a type of property (an RDF property class) which is a hint that we mean to represent that property as an actual visible edge (and therefore model a link that is an actual conceptual “link”)? So in JSON-LD the @context would have something like “knows” : { “@type”: “@link” }. Then domain specific applications of this, like for influence mapping, could subclass that @link class into an @influence class for instance. Other applications might subclass the @link class into a @beneficiary_ownership class...

jmatsushita commented 9 years ago

Hey @elf-pavlik @pudo following up on this I've given a shot at https://github.com/uf6/ottograf

Could you let me know what you think?