Translator -> Ingester transport format

parkan commented 8 years ago

Leaving aside the issue of the DSL, the format emitted by the translator needs work.

Currently we have a PhotoBlob that carries author, which gets unset (OrientDB chokes on embedded objects) and turned into a vertex + edge at the last second. This is a hack

Blob bundle

One alternative is to return a "blob bundle" (see https://github.com/mediachain/L-SPACE/tree/ak-blobbundle, the implementation is incomplete because of issues related to #38 and messy return types of add* methods) containing a "self"/primary blob, which is the "subject" or top-level object of this particular insertion, as well as a list of secondary blobs with their roles (author, movement, organization, etc). This assumes a hub-and-spokes data model for any individual insertion (no children of children, because one end of each edge is implicitly the "self") but produces a reasonably simple format. The one-hop limitation can potentially be overcome with some kind of recursion (though then we are basically building trees)

Subgraph

Another alternative is to pass an entire tree/subgraph as a TinkerGraph. This could hypothetically be inserted reasonably easily by using .getVertexes/.getEdges and inserting them in order. However, the graph from the translator's perspective is not actually isomorphic to the final subgraph, because the translator doesn't know about canonicals (so edges have to be moved, not just extended). Also, we actually have to build the darn TinkerGraph

???

Any other ideas?

yusefnapora commented 8 years ago

I've been mulling this over a bit... I don't have answers yet, but wanted to put down my thoughts.

I think that we'll probably want something similar to the subgraph approach, although maybe not packed into a TinkerGraph... My thinking is that in the medium-term, we're going to be building out a front end to display / traverse the mediachain graph. So we'll need to have some kind of serializable representation of a subgraph. My current graph visualization experiments have us generating json that's tailored for rendering by cytoscape.js - but maybe we can have one format for sending subgraphs around "over the wire", that works for both the front-end and the Translator -> Ingress boundary? I'd be fine with massaging that format into cytoscape-flavored json in the client javascript if need be.

The problem of the subgraph not being isomorphic seems like it might be resolvable by rethinking how we model authorship... It seems like the main problem is that we're currently representing authorship with an edge from ImageBlob -> Canonical. But the translator can't have complete knowledge of the Canonicals that exist in the larger graph, so it's not possible to treat the translator's output as a pluggable piece of the larger graph.

So the question is, can we model authorship in the "big graph" by drawing edges between ImageBlob and Person nodes directly? I think we can; it just means changing some of the traversals a bit. Things get somewhat trickier when merging / revisions are taken into account, but since we're treating merging of Person blobs as out of scope for the moment, that sounds like tomorrow's problem :)

bigs commented 8 years ago

honestly pre-serializing a graph and packing it all into a static asset w/ webpack might be a nice stop gap. definitely doesn't scale, but also won't take too much time. could do some basic typeahead searching stuff using off the shelf libs, too. maybe limit dataset to 1000 entities

On Mon, Apr 4, 2016 at 3:37 PM, Yusef Napora notifications@github.com wrote:

I've been mulling this over a bit... I don't have answers yet, but wanted to put down my thoughts.

I think that we'll probably want something similar to the subgraph approach, although maybe not packed into a TinkerGraph... My thinking is that in the medium-term, we're going to be building out a front end to display / traverse the mediachain graph. So we'll need to have some kind of serializable representation of a subgraph. My current graph visualization experiments have us generating json that's tailored for rendering by cytoscape.js - but maybe we can have one format for sending subgraphs around "over the wire", that works for both the front-end and the Translator -> Ingress boundary? I'd be fine with massaging that format into cytoscape-flavored json in the client javascript if need be.

The problem of the subgraph not being isomorphic seems like it might be resolvable by rethinking how we model authorship... It seems like the main problem is that we're currently representing authorship with an edge from ImageBlob -> Canonical. But the translator can't have complete knowledge of the Canonicals that exist in the larger graph, so it's not possible to treat the translator's output as a pluggable piece of the larger graph.

So the question is, can we model authorship in the "big graph" by drawing edges between ImageBlob and Person nodes directly? I think we can; it just means changing some of the traversals a bit. Things get somewhat trickier when merging / revisions are taken into account, but since we're treating merging of Person blobs as out of scope for the moment, that sounds like tomorrow's problem :)

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/mediachain/L-SPACE/issues/39#issuecomment-205461907

parkan commented 8 years ago

Probably worth thinking about for mediachain client

parkan commented 8 years ago

This format is now Mediachain cells :boom:

mediachain / L-SPACE

Translator -> Ingester transport format #39

Blob bundle

Subgraph

???