Open TitiHl opened 6 years ago
and also I found using valueMap with remote graph, I have to pass in valueMap(true) to get the properties using GraphTraversalSource: https://stackoverflow.com/questions/45764199/janusgraph-cluster-always-returns-vertex-without-properties-referencevertex
g.V().valueMap(true).toList()
but from scalaGraph of:
val scalaGraph: ScalaGraph = EmptyGraph.instance().asScala().configure(_.withRemote(DriverRemoteConnection.using(cluster, "g")))
there is no way I can pass valueMap(true). wondering what is the best way to get the properties of Elements using ScalaGraph in this way.
As you mentioned already, EmptyGraph
is the problem. Simply use org.janusgraph.core.JanusGraph
and everything should be fine.
Did you see https://github.com/mpollmeier/gremlin-scala-examples/ ? It contains a JanusGraph example repo and I just added a line to prove that you can add edges. valueMap
works as well, e.g. if you add println(scalaGraph.V.valueMap.toList)
.
Hi,
Thanks for your reply. The reason I use EmptyGraph
is I am using it to initialise a remote graph, I guess I have to use EmptyGraph to connect to a remote as I found all the docs are using this way unless I missed a way to construct a EmptyGraph of JanusGraph.
I am basically want to achieve exactly the same as the DSE example but with JanusGraph: https://github.com/mpollmeier/gremlin-scala-examples/blob/master/dse-graph/src/test/scala/SimpleSpec.scala somehow the return type of DSE example is DetachVertex while for JanusGraph is ReferenceVertex while I cannot add edges here.
Thanks, Alex
Ok, I understand the issue now. I don't have much capacity to fiddle with this myself, but just had a brief look at Janus' documentation. Have you tried to pass the remote url etc. in the config that you pass to JanusGraphFactory, rather than using EmptyGraph.withRemote
? E.g.
storage.backend=cassandra
storage.hostname=localhost
Hi,
the cluster
points to the conf file that points the Remote JanusGraph server, while the JanusGraph Server has the all storage backend etc. settings.
But you are right, maybe I can specify the backend of JanusGraph explicitly that I can get sth. more than a EmptyGraph. will try this out.
Thanks for your help here. Cheers, Alex
To follow up on this old issue, I believe I have similar problems. Well, similar in that I found I can not use the fancy operators with remote connections (if I do the data is not saved).
I have forked gremlin-scala-examples to show this: I'm using JanusGraph Server as my server here, but whatever it's mostly just Gremlin Server underneath: rwilcox/gremlin-scala-examples JanusGraph example for JanusGraph Server
There are three examples here:
Java copied from Janusgraph's official remote example
an operator ( / structure) based API example with a remote JanusGraph, copied from gremlin-scala janusgraph example
a traversal API based example, where I (poorly!) try to convert the edge / vertices operator API -> traversal API.
(@TitiHl these all use JanusGraphFactory.open("inmemory")
instead of EmptyGraph()
, which I found too limiting ie not supporting transactions etc etc)
TL; DR:
val graph : ScalaGraph = JanusGraphFactory.open("inmemory").configure( _.withRemote( conf) )
linegraph + ( "Saturn", Key[String]("name") -> "Saturn" )
lineBUT, I used the addV
traversal methods, like so:
val graph : ScalaGraph = JanusGraphFactory.open("inmemory").configure( _.withRemote( conf) )
graph.addV().property( Key[String]("name"), "Saturn" ).iterate()
lineBy going through issue history, I found #118, which has the following comment link - which is one of the reasons why graph.addV()
exists at all!
ScalaGraph does have addV (I just called it addVertex). Also note that we have a nicer syntax to add vertices/edges, you might want to use that instead (it's documented on the front page (readme))
I think the difference is that addVertex on a Graph instance does not create a traversal, it operates directly on the Graph where addV operates on the traversal source and creates a traversal.
But methods like +
and ---
operate on ScalaGraph
objects (calling addVertex
), not the underlaying traversal source object.
The reason why operating on a graph vs operating on a traversal is important is because it seems to be that the (best? only?) way to connect to Gremlin ... err JanusGraph... Server is via TraversalSource's withRemote method.
@mpollmeier does this logic sound right to you? (I'm a relative newbie to this project and graph / tinkerpop in general)
@TitiHl : I have not tested the original bug with this configuration (JanusGraphFactory.open("inmemory")
) vs the other , but that may solve your problem ???
In general, It would be great to somehow have the +
or ---
operators also work on gremlin.scala.TraversalSource
objects, instead of just ScalaGraph
objects. (Is there a way to force this??)
Interesting - I'll run this tomorrow and see if I can find a workaround. To make sure we're on the same page: how exactly did you start janusgraph? I'm just downloading janusgraph-0.2.0-hadoop2.zip from https://github.com/JanusGraph/janusgraph/releases/.
The docs suggest to run gremlin.sh
and then graph = JanusGraphFactory.open('conf/janusgraph-berkeleyje-es.properties')
- is this what you did?
Awesome, thanks! Take a look at the JanusGraph Server Getting Started, but TL;DR: use bin/janusgraph.sh start
<-- should work out for you
Ok so it turns out that we shouldn't use Graph
to add elements, and instead always use the Traversal
. This doesn't impact local graphs (one edge case though: the user cannot provide the element id), and is the only way to handle remote graphs, as you had to figure out yourself painfully.
I've made a start to change everything to use a traversal (only for vertices so far) in https://github.com/mpollmeier/gremlin-scala/commit/f74078858954969848caa47d7186a2f767456520 - let me know your thoughts.
So I can actually test this, maybe you help me with the following: when I run your test cases, I get the following error:
- janusgraph server ported Java (from janusgraph-server example) *** FAILED ***
java.util.concurrent.CompletionException: io.netty.handler.codec.DecoderException:
org.apache.tinkerpop.gremlin.driver.ser.SerializationException:
org.apache.tinkerpop.shaded.kryo.KryoException: Buffer underflow.
Any ideas what's wrong? Some missing configuration?
Thanks for bringing this back up and providing a nice project to reproduce, @rwilcox
Other random thoughts:
janusgraphfactory.open(inmemory)
is misleading, it gives you the (false) sense that you can actually use that graph instance. Use EmptyGraph insteadWoh, awesome! I'll take a look at the changes probably tomorrow,
(And that buffer underflow error sounds familiar too - I can't place it but I'll check it out at work tomorrow... maybe there it will come to me).
I have answers to your random thoughts now:
you're building up a ClusteredClient but don't actually use it
Yes, in my reading sample code / readings docs / and code provided to me from others on my Current Graph Database Project, I believe the ClusteredClient
etc provides JanusGraph specific management features: ie the ability the create indexes to speed up searching, schemas, etc. But only learned this in the last day or so. (And I don't actually do those things in the sample code, yes)
... IMO using janusgraphfactory.open(inmemory) is misleading, it gives you the (false) sense that you can actually use that graph instance. Use EmptyGraph instead
Maybe. What I believe / assume is happening is that creating a traversal off an EmptyGraph will give you only features available in generic Gremlin Server, but basing the traversal off a JanusGraph gives you JanusGraph features.
I'm super interested in what the local graph instance is used for in remote traversal situations: is it just a bootstrap mechanism or is it used somehow ie does it hold a subgraph in memory for cache reasons????? I may go ask the JanusGraph people, as my lead engineer had similar questions (ie if it is used for something like caching, that may have memory implications for mid to large graphs).
It would certainly be a good idea to use the graph instance for some local caching, but I don't think it's doing that, instead it just seems to be a bootstrap for the traversal..
@rwilcox any news re the DecoderException
? Can you reproduce it?
any news re the DecoderException? Can you reproduce it?
No, and my browser history and notes didn't help either :(
'No' as in, if you run the test locally it works, and you don't get that exception?
If so, what exactly did you do? I downloaded the 0.2.0-hadoop2 release, unpacked and ran bin/janusgraph.sh -v start
, and then ran the test.
'No' as in, if you run the test locally it works, and you don't get that exception? If so, what exactly did you do? I downloaded the 0.2.0-hadoop2 release, unpacked and ran bin/janusgraph.sh -v start
Correct - on OS X 10.12 with JAVA_HOME
set to a 1.8 JVM, I ran bin/janusgraph.sh -v start
then ran my tests one by one in IntelliJ. No error. (Are you using JVM 1.7 or 1.9 maybe???????)
How about if you run it in sbt?
I'm on linux with java 1.8
java -version
openjdk version "1.8.0_144"
OpenJDK Runtime Environment (build 1.8.0_144-b01)
OpenJDK 64-Bit Server VM (build 25.144-b01, mixed mode)
I just freshly unpacked janusgraph and ran sbt test
. Output on janusgraph console:
27043 [gremlin-server-worker-1] ERROR org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor - Could not deserialize the Traversal instance
and sbt
*** 3 TESTS FAILED ***
Ok, super weird: sbt test
gives me the error too. Was not expecting that (given my success with IntelliJ)
Ok, super interesting. In my IntelliJ test configuration there's a checkbox to "use SBT". It was off. When I checked it to be on I got the same error in IntelliJ.
I guess I can see the Scala IntelliJ plugin somehow wanting to bypass sbt for Reasons by default
That's good news, we're getting the same results :) Let me know when you get to the bottom of the error, maybe the working setup with intellij can help? Maybe there's a difference in the classpath?
Hey, I'm also interested in this. I got a similar SerializationException
running some slightly different code. After some digging I solved it by explicitly specifying the serializer when creating the cluster like so:
private def buildCluster() = {
val serializer = new GryoMessageSerializerV1d0(GryoMapper.build().addRegistry(JanusGraphIoRegistry.getInstance()))
val cluster =
Cluster.build().addContactPoint("localhost").port(45679).serializer(serializer).create()
cluster
}
Hope it helps!
For what it's worth, I'm running into the same issues trying to connect to a new Amazon Neptune GraphDB Cluster.
val builder: Cluster.Builder = Cluster.build()
builder.addContactPoint("my-endpoint.amazonaws.com")
builder.port(8182)
val cluster: Cluster = builder.create()
val graph = EmptyGraph.instance().asScala().configure(_.withRemote(DriverRemoteConnection.using(cluster)))
Gives the same errors:
(Empty)Graph does not support adding vertices
@alicefuzier thanks for sharing, but that didn't fix the exception I'm getting:
io.netty.handler.codec.DecoderException: org.apache.tinkerpop.gremlin.driver.ser.SerializationException: org.apache.tinkerpop.shaded.kryo.KryoException: Buffer underflow.
I don't know much about Janus and it's serialisation unfortunately.
@apatzer that's the error you get when you add a vertex with graph.addV
, or graph + someCaseClass
. Until this is resolved, the workaround is to add your vertex in a traversal, i.e. using the addV
step in GremlinScala. Note: case classes aren't yet supported for that.
Ok I just figured out how to connect to janusgraph. Use a different serialiser.
hosts: [localhost]
port: 8182
serializer: {
className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerGremlinV2d0,
config: {
ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry]
}
}
I'll continue with the changes to allow adding vertices to a remotegraph shortly.
I just found some time to dig deeper into this. The underlying problem is that the configuration for remote is not stored in the graph instance, but in the TraversalSource. Because of that, one cannot simply call e.g. vertex.addEdge
any more, because that doesn't know about the TraversalSource, and therefor the remote graph.
Since IMO the graph instance should hold that information (ScalaGraph
does by holding onto the TraversalSource), I decided to add that as an implicit for the arrow DSL. I.e. from now on you need to have an implicit ScalaGraph
in scope, then the arrow DSL works fine with remote and local graphs.
I just released gremlin-scala 3.3.1.2 and provided a working example for gremlin-server. I'm still fighting with janusgraph (the basic setup is here), and assume I need to release a new version for 3.3.0, since Janusgraph hasn't released anything for 3.3.1 yet.
It looks like I'm running into a similar issue even with 3.3.1.2 when interacting with a Neptune graph.
org.apache.tinkerpop.shaded.kryo.KryoException: Buffer underflow.
val cluster = Cluster.build()
.addContactPoint(url)
.port(port)
.create()
implicit val g = EmptyGraph.instance().asScala
.configure(_.withRemote(DriverRemoteConnection.using(cluster, "g")))
object Name extends Key[String]("name")
// this succeeds
g.addV("Node").property(Name, "N/A").valueMap.head()
try {
//this triggers the error
g.addV("Node").property(Name, "N/A").head()
} catch {
// error also occurs for the below expression
case _: KryoException => g.addVertex("Node", Name.name -> "N/A")
}
Have you found any solutions other than changing the serializer? Neptune does not have such an IORegistry published as far as I can tell
Turns out it was user error.. You can modify my above example to add this unmodified serializer and it will function properly
val cluster = Cluster.build()
.addContactPoint(url)
.port(port)
.serializer(new GraphSONMessageSerializerV3d0())
.create()
Thanks for the great work bringing gremlin to scala!
Quick update re JanusGraph: since it's last release (0.2.0) is still based on tinkerpop 3.2.x I can't backport the new model for handling this in remote graphs, because it relies on GraphTraversal.from(Vertex)
which was only introduced in 3.3.x. I'll only provide a working JanusGraph example when they release a new version.
A quick update... JanusGraph 0.3.0 was released on July 31, 2018. It now supports tinkerpop 3.3.3.
Finally got around to setting up a remote janusgraph example: https://github.com/mpollmeier/gremlin-scala-examples/blob/fcc048e/janusgraph/src/test/scala/SimpleSpec.scala#L55
I found debugging the serialisers non-straightforward, but here's a setup that works:
val serializer = new GryoMessageSerializerV3d0(GryoMapper.build.addRegistry(JanusGraphIoRegistry.getInstance))
val cluster = Cluster.build.addContactPoint("localhost").port(8182).serializer(serializer).create
implicit val graph = EmptyGraph.instance.asScala.configure(_.withRemote(DriverRemoteConnection.using(cluster)))
@mpollmeier Actually, this problem is still actual for Amazon Neptune. At least I have no idea how to initialize connection in a way it worked.
does anyone have a working setup for gremlin-groovy or gremlin-java? this isn't really a gremlin-scala specific issue..
I can't get the gremlin-server example to work. It adds vertices / edges fine as in the provided example.. but fails to retrieve any property values
@nkconnor @mpollmeier For neptune I'm using the following setup. Still new to gremlin, so not sure how much of this is specific to issues with Neptune. I sort of walked it back from using idomatic (gremlin-)scala in a lot of areas, but am still able to use the g + CC
functionality.
Like @hudsonmd said, adding the following serializer is important:
val cluster = Cluster.build()
.addContactPoint("localhost") // with ssh tunnel to Neptune
.port(8182)
.serializer(new GryoMessageSerializerV3d0())
.create()
implicit val g = EmptyGraph.instance.asScala().configure(_.withRemote(DriverRemoteConnection.using(cluster)))
And then for retrieving an item map with properties:
val userMap = g.V().has[Int](UserLabel, userIdKey, id)
.valueMap()
.head()
Calling the valueMap before head is important if you want properties - also calling head seems to stop a traversal, so something like .head().out()
I don't think works.
Updates for properties in neptune seem to only work if you first delete the property. I was running into this gnarly bug where if you try to increase the property value of a Double it works, but if you try to decrease it it does not.
g.V().has(UserLabel, userIdKey, uId).properties(userPropertyKey.name).drop().iterate()
g.V().has(UserLabel, userIdKey, uId).property(userPropertyKey, myPropertyValue).iterate()
@Joe29 are you able to use .toCC[User]
prior to head? Or do you work soley with the value maps?
// .toCC errors
java.lang.IllegalArgumentException: Class is not registered: gremlin.scala.GremlinScala$$Lambda$48836/1166529519
Note: To register this class use: kryo.register(gremlin.scala.GremlinScala$$Lambda$48836/1166529519.class);
at org.apache.tinkerpop.shaded.kryo.Kryo.getRegistration(Kryo.java:484)
at org.apache.tinkerpop.shaded.kryo.Kryo.getSerializer(Kryo.java:502)
@nkconnor It's been a couple of weeks since I touched the code, but I do recall having problems with .toCC
. Yeah it looks like I've got some dirty code to work around that with the value maps.
@nkconnor If you want to get up and running feel free to use this, though it's not pretty (two json libs??). I'm also making some assumptions here about the type of data/case class. Let me know if you find a cleaner solution.
import spray.json._
import org.json4s.DefaultFormats
import org.json4s.native.Json
def mapToJSON(map: Map[String, Any]): String ={
val correctedMap = map.map(kv => {
if (kv._2.isInstanceOf[java.util.ArrayList[Any]]){
kv._1 -> kv._2.asInstanceOf[java.util.ArrayList[Any]].head
} else {
kv
}
})
Json(DefaultFormats).write(correctedMap)
}
implicit val userFormat = jsonFormat3(User)
val m = g.V().has[Int](UserLabel, userIdKey, id)
.valueMap()
.head()
mapToJSON(m.toMap).parseJson.convertTo[User]
Thanks for the help Joe.... I'm going to look around at other graph libraries since the remote support is limited
Hi,
Thanks for building this nice wrapper for Scala :D. I am currently use this on a Remote JanusGraph by calling:
val scalaGraph: ScalaGraph = EmptyGraph.instance().asScala().configure(_.withRemote(DriverRemoteConnection.using(cluster, "g")))
but found I lost some syntax benefit for gremlin-scala, say, if I want to add an edge between v1 and v2, I can no logger call:
val edge = v1 --- ("reference", metadata -> "EdgeTest", deleted -> false) --> v2
Exceptions below:I believe this is the cause of EmptyGraph as a underlying graph. referring to this example: https://github.com/mpollmeier/gremlin-scala-examples/blob/master/dse-graph/src/test/scala/SimpleSpec.scala instead I have to call
val a = StepLabel[Vertex]() val b = StepLabel[Vertex]() scalaGraph.V(v1.id).as(a).V(v2.id).as(b).addE(REFERENCE).from(a).to(b).property(metadata, "EdgeTest").property(deleted, false).iterate()
this is one of the examples that I cannot use nice wrapper provided by gremlin-scala when I am working on a remote graph, so wondering if i missed sth. here as I am still manipulating on a ScalaGraph or there is a better way to add vertex/edges in remote graph.
Thanks for your help in advance. Alex