joshsh / graphsail

RDF storage and inference layer (SAIL) for Apache TinkerPop 3
Apache License 2.0
9 stars 3 forks source link

[Question] Can use GraphSail to interact with Cosmos DB #1

Open charbull opened 7 years ago

charbull commented 7 years ago

Hi @joshsh,

I am just entering the world of the Gremlin. Is it possible to use GraphSail on top of Cosmos DB to serialize RDF and execute SPARQL queries?

Thank you for your help, Charbel

joshsh commented 7 years ago

Hi Charbel,

It isn't possible out of the box, but it shouldn't be difficult to implement the GraphIndex class for Cosmos DB. See the TinkerGraph implementation [1], and let's continue this thread offline to get that connector written. You will be able to store and retrieve RDF, and evaluate SPARQL queries.

Note: the SPARQL implementation is as correct as that of TP2 GraphSail (because it uses the same tests), although I still plan to add a more thorough test suite.

Josh

[1] https://github.com/joshsh/graphsail/blob/master/src/main/java/net/fortytwo/tpop/sail/tg/TinkerGraphIndex.java

On Mon, Aug 21, 2017 at 1:59 PM, Charbel kaed notifications@github.com wrote:

Hi @joshsh https://github.com/joshsh,

I am just entering the world of the Gremlin. Is it possible to use GraphSail on top of Cosmos DB to serialize RDF and execute SPARQL queries?

Thank you for your help, Charbel

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/joshsh/graphsail/issues/1, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFpBFg4so3Dg5oN0q_SAE7GD0DN2e5Pks5sae-VgaJpZM4O90hU .

joshsh commented 7 years ago

Btw. this is "offline". I thought I was replying to a gremlin-users post :-)

charbull commented 7 years ago

Hi Joshsh, Thank you for your reply, I just forked your code to understand better what is TinkerGraphIndex purpose. If you have some architecture or documentation about your work, it can be great, I am new to Tinker.

I am trying to figure out how GraphSail works. It seems that the (Gremlin) Graph is the link between GraphSail and Cosmos? So GraphSail dont need to know about the (localhost, port, dbName etc) ?

I am excited to get this connector written ! :D

joshsh commented 7 years ago

I am new to Cosmos DB, so let's meet in the middle. Since you support connections via Gremlin Console, I assume you support a TinkerPop Graph implementation exposed via Gremlin Server. That means GraphSail does not need to know the low-level details of the data store. However, I/we need to look at that Graph implementation and the classes it depends on in order to create the index wrapper.

Currently, the only GraphSail documentation is from the TinkerPop 2 era. Integrating with the new GraphSail is slightly different because TP3 does not include an index API. Therefore, we need a specific index wrapper for each graph back-end. In most cases, this wrapper should be fairly simple.

charbull commented 7 years ago

Hi Joshsh, So the cosmos db (according to Azure Support) does not support yet Object serialization. In other words, TinkerGraph cannot be populated, serialized, and sent to the Cosmos Graph API. So far the client only accepts 'String' representation of a property graph in Gremlin [1].

So when you mention the need for an index wrapper, how can we write such wrapper without object representation in cosmos?

joshsh commented 7 years ago

@charbull I'm not sure I understand. Is there a public API you can point me to? I browsed the website but didn't find what I was looking for.

It's not that you would be populating TinkerGraph, but that you would have implemented Graph. TinkerGraph is another Graph implementation. Is there a CosmosGraph class somewhere that adapts Cosmos DB to the Graph interface?

http://tinkerpop.apache.org/javadocs/current/core/org/apache/tinkerpop/gremlin/structure/Graph.html

joshsh commented 7 years ago

Btw. at a glance, the link you posted looks like a client which talks to Gremlin Server. It's "what is under the hood" of Gremlin Server, serving up the graph data, that we need to work with.

ghost commented 7 years ago

Hi @joshsh,

Regards, Bart,

joshsh commented 7 years ago

Thank you for the links, @bartdotnet. The Gremlin implementation includes a pretty extensive set of steps. All of this is done without the use of a (.NET port of) the TinkerPop API? If that is the case, then GraphSail is not compatible with Cosmos DB on the server side, although it might be possible to connect via withRemote; it depends on whether Cosmos DB provides a complete Gremlin Server -like service via WebSocket or HTTP. The remote console examples in the documentation you linked suggest that it might.

charbull commented 7 years ago

Hi @joshsh and @bartdotnet, It seems that there are two possible APIs to interact with cosmos from a Graph perspective.

@joshsh I believe you want to write a GraphIndex on top the Graph.Elements?

joshsh commented 7 years ago

@charbull, Graph.Elements is similar to the TinkerPop-compatible API I was expecting to find, but it is not quite TinkerPop (for example, in Vertex we should have an edges() method, in Edge we should have inVertex() and outVertex(), etc.). We would need a TinkerPop wrapper for Graph.Elements if we were to connect GraphSail on the server side.

charbull commented 7 years ago

@joshsh I am seeing better now. so we would need a wrapper for tinker pop to cosmos elements. And we would still need a CosmosGraphIndex similar to the TinkerGraphIndex. If yes what is the purpose of the such index?

Thank you

joshsh commented 7 years ago

@charbull that's one way. On the server side, you would need TinkerPop wrappers for Graph, Edge, Vertex, Property, etc. as well as an implementation of GraphIndex, which would tie into the underlying Cosmos API. For GraphIndex, you just need a way of creating a vertex index, adding a vertex to the index, and removing a vertex from the index.

The other possibility I mentioned is to connect via withRemote. Is it possible to query Cosmos DB using (the built-in TinkerPop) Gremlin Console, or is the .NET client required? If one can issue simple Gremlin queries (limited to the supported steps) to Cosmos DB through a remote console, then it should also be possible to layer GraphSail on top of Gremlin on the client side.

charbull commented 7 years ago

@joshsh

Gremlin Driver from Java

Take a look at the Getting started example provided by the Cosmos DB Graph. The pom only contains a dependency to the Gremlin Server [1].

And then the code is standard Gremlin driver API.

cluster = Cluster.build(new File("src/remote.yaml")).create();
client = cluster.connect();

String gremlinQueries[] = new String[] {
 "g.V().drop()",
 "E().drop()",
 "g.addV('person').property('deviceid', 'thomas').property('firstName', 'Thomas').property('age', 44)",
   "g.addV('person').property('deviceid', 'mary').property('firstName', 'Mary').property('lastName', 'Andersen').property('age', 39)",
...
ResultSet results = client.submit(gremlinString);

The above code works well and adds the Vertex correctly.

WithRemote from Gremlin Console

I tried withRemote, (I am just starting with Gremlin, I might have done something wrong)

image

I was able to connect to cosmos db but couldn't create a vertex.

Any thoughts on this? Can we plug GraphSail on the Gremlin driver? I think that the Objects are not yet supported, only string that's why withRemote is not working

joshsh commented 6 years ago

@charbull, are you able to execute a simple read query (e.g. g.V().count()) through that remote connection? I have gone through the setup process, and am able to connect to the graph using vanilla Gremlin Console, but withRemote isn't working for me; Gremlin seems to wait indefinitely on DriverRemoteTraversal$TraverserIterator.next which never unblocks.

charbull commented 6 years ago

Hey @joshsh, I have the same behavior. I got in contact with MS, it seems that the object support is coming soon. So for now, we can only send string so I doubt g.v().count will work.

ghost commented 6 years ago

Hi,

in https://github.com/bartdotnet/CosmosDbTests I tried to provide examples how to test connectivity via

all examples issue a simple "g.V().count()" that should work (note url to endpoint for (2) and (3) differ from (1).

Bart,

joshsh commented 6 years ago

Thanks, @bartdotnet -- this makes it easy to explore those options. Per my observation and @charbull's comment, I don't see a straightforward path to GraphSail compatibility yet (through Gremlin.Net or Gremlin Console), but the upcoming object support sounds promising.