segfly-oss / graml

Library to represent Tinkerpop graphs as YAML
Apache License 2.0
3 stars 2 forks source link

Provide Support for TinkerPop3 #7

Open spmallette opened 8 years ago

spmallette commented 8 years ago

Apache TinkerPop 3.0 was released in GA several months ago:

http://tinkerpop.incubator.apache.org/

and includes a new GraphReader and GraphWriter API:

http://tinkerpop.incubator.apache.org/docs/3.0.1-incubating/#_gremlin_i_o

segfly commented 8 years ago

@spmallette Thanks, will look into adding support for 3.0. PRs are definitely welcome.

spmallette commented 8 years ago

Cool....I noticed in the README that graml isn't so good for large graphs. TinkerPop3 has tried to take the approach that all aspects of it should scale up. To that end you'll see that GraphSON and Gryo formats from TinkerPop (not much we could do with GraphML as it is a standard format) are adjacency list format as opposed to edge list. In this way the files are easily split for parallel processing by Spark/Hadoop/etc:

http://tinkerpop.incubator.apache.org/docs/3.0.1-incubating/#hadoop-gremlin

I would wonder if graml could work in a similar fashion or perhaps there is a different form of graml that could accommodate those systems. Any thoughts on that?

segfly commented 8 years ago

The only reason why graml is not a good fit right now for huge graphs, is because it reads in the entire graml file at once - e.g. it will try and buffer a 1TB graml file into memory before ever calling the tinkerpop API. This is mainly due to the current yaml processing library currently used.

However, processing the graml file in smaller pieces is definitely possible. Graml will likely only supports additive operations in the notation. It also creates vertices as it encounters them. These characteristic means that graml graphs can be processed in a distributed nature - as a simple union set operation of the subgraphs will produce the complete graph.