Open snowch opened 8 years ago
I might be interested. What approaches did you talk about?
@KimStebel - as a starting point create a java application that can perform one-off exports of some data on the cluster (e.g. hdfs) into graph. Maybe it would help to use spark because of the uniform access to many different data sources into a spark dataframe. From the dataframe we could do something like collect() to loop through the data and export to graph (in a single-thread).
@KimStebel - still interested? :)
Importing larger datasets into graph isn't trivial at the moment, so that would actually be a good example. I'm afraid we wouldn't be able to use the bulk import endpoint and would instead have to use gremlin queries. I can certainly help with that, but I'm not sure how much time I'll have.
@KimStebel - we don't have any specific time pressures on this. Any help you can give will be appreciated. Let me know if you need access to a cluster to develop this.
Yes, I would indeed need access to a cluster. Also, what about getting data out of graph? I'm still a bit confused whether Graph will be positioned more towards the OLAP or the OLTP side of things, but query times limited to 60 seconds suggest OLTP. So getting data out of graph and into something that does longer running processing jobs (spark, hadoop) seems important, too.
This is a placeholder task for creating an example integration with CDS Graph. Some approaches were discussed offline at a high level with @ukmadlz
If you want to contribute towards this - please add a comment.