Closed fommil closed 8 years ago
I'm just spotting that you have a 2.x branch and 2.6.1 is cross built, nice. That could be interesting, I may take a look after trying things out in the raw Java API first. But with regards to the community-builds above, I think we'd still need some help from you to get that on-boarded with them if we decided to use this API. https://github.com/mpollmeier/gremlin-scala/tree/2.x
Sounds like a great idea, and very exciting for me as a happy ensime user :) I will provide more detailed answers shortly, for now just a few pointers for your understanding:
Sample repos including setup for neo4j and orientdb https://github.com/mpollmeier/gremlin-scala-examples
OrientDB driver for Tinkerpop3: I started this one as a proof of concept, but by now we have 5 contributors and it is really active. We use it in a few apps internally. It's used in the examples repo. https://github.com/mpollmeier/orientdb-gremlin
We had a shapeless implementation for serialisation, but the macro had more functionality (both were contributions). Happy to look at using a plain shapeless implementation again. https://github.com/mpollmeier/gremlin-scala/commit/31ef226
We used to cross-build 2.10 and 2.11 until recently, I think we dropped it because of the macro - not sure though, will have a look tomorrow.
I would not recommend to use the 2.x branch for anything new, Tinkerpop3 is the future and all the nifty type safe features are only in the 3.x (master) version.
great! :smile:
I would need to be sure that the v3 tinkerpop stuff works on Java 6 and isn't going to pull in any scala binary version headaches. Is that something you'd be able to check out? I'm all out of steam today :weary:
We'd be happy to use your orientdb v3 impl when you think its strong enough. It is also pure Java I see, which simplifies things (and Java 6 compatible?).
and if we do end up using this, I'll probably do the shapeless magic bit. Possibly even set it as an advanced exercise for the workshop :smiling_imp:
just checked, there is a bunch of Java 8 stuff when I use v3.x... so we can't use it :cry:
My excitement made me conveniently overread your notes re java 6 but I just realised the same. We could probably backport most of the features of gremlin-scala 3.x to gremlin-scala 2.x, but:
https://twitter.com/apachetinkerpop/status/658472326544101376
As you already mentioned in your ticket you might run into scala version issues with neo4j. Until a few months ago they only supported 2.10, and I believe now it's only 2.11. https://github.com/mpollmeier/gremlin-scala-examples/issues/4 https://github.com/neo4j/neo4j/issues/2881
Is the main reason to use java 6 that some enterprises are still on java 6 as mentioned in https://github.com/ensime/ensime-server/issues/1118 ? It's long outdated and not supported any more - the IE6 of JVMs :) Even Java 7 is end of life: http://www.oracle.com/technetwork/java/eol-135779.html
Yeah Java 6 is still the official JDK for Scala 2.10 and 2.11 and a quick poll on twitter suggests that half of ENSIME users still have to support Java 6 systems. It is also the JDK used by the entire build system and when scalajs tried to drop support they caused a riot and had to revert their decision.
I'm aware that gremlin 2 is no longer active but given that its our only choice, I prefer to read that an "stable" :-)
I don't want to cause any extra work for you but if you're keen to help it would be really appreciated!
But lets wait until I've done the PoC first. It might turn out that graph databases just aren't fast enough.
FYI and a bit off topic on the above, but this is the sort of thing I'd be doing with shapeless. It's not too much work to extend this to handle sealed traits, and various forms of customisation a la spray-json-shapeless
, but I only have very simple requirements.
What this doesn't do is give any form of fancy syntax for the query filtering. I suspect Record
s are more appropriate there.
package marshalling {
// TODO: error handling with Either[String, Map[String, Any]]
trait SVertex[T] {
def vClass: String
def toProperties(t: T): Map[String, Any]
def fromProperties(m: Map[String, Any]): T
}
trait SPrimitive[T] {
def toValue(v: T): Any
def fromValue(v: Any): T
}
object SPrimitive {
implicit object StringSPrimitive extends SPrimitive[String] {
def toValue(v: String): Any = v
def fromValue(v: Any): String = v.asInstanceOf[String]
}
}
}
package object syntax {
import marshalling._
/** Creates a context to communicate with the Graph. */
def withGraph[T](f: Graph => T)(implicit factory: OrientGraphFactory): T = {
val graph: Graph = factory.getTx()
try f(graph)
finally graph.shutdown()
}
/** Syntactic helper for serialisables. */
implicit class RichSerialisable[T](val t: T) extends AnyVal {
def vClass(implicit s: SVertex[T]): String = s.vClass
def toProperties(implicit s: SVertex[T]): Map[String, Any] = s.toProperties(t)
def fromProperties(props: Map[String, Any])(implicit s: SVertex[T]): T = s.fromProperties(props)
}
/** Glues the serialisable typeclass implementations to the Graph API. */
implicit class RichGraph(val graph: Graph) extends AnyVal {
/** Side-effecting vertex insertion. */
def insertV[T: SVertex](t: T): Vertex = {
val props = t.toProperties
val v = graph.addVertex(t.vClass)
props.foreach {
case (key, value) => v.setProperty(key, value)
}
v
}
}
}
package object impl {
import marshalling._
import syntax._
import shapeless._, labelled.{ field, FieldType }, syntax.singleton._
implicit def hNilSVertex[T]: SVertex[HNil] = new SVertex[HNil] {
def vClass: String = ???
def toProperties(t: HNil): Map[String, Any] = Map.empty
def fromProperties(m: Map[String, Any]): HNil = HNil
}
implicit def hListSVertex[Key <: Symbol, Value, Remaining <: HList](
implicit
key: Witness.Aux[Key],
prim: SPrimitive[Value],
remV: Lazy[SVertex[Remaining]]
): SVertex[FieldType[Key, Value] :: Remaining] =
new SVertex[FieldType[Key, Value] :: Remaining] {
def vClass: String = ???
def toProperties(t: FieldType[Key, Value] :: Remaining): Map[String, Any] = {
val value = prim.toValue(t.head)
remV.value.toProperties(t.tail) + (key.value.name -> value)
}
def fromProperties(m: Map[String, Any]): FieldType[Key, Value] :: Remaining = {
m.get(key.value.name) match {
case None => throw new IllegalArgumentException(s"missing key ${key.value.name}")
case Some(value) =>
val resolved = prim.fromValue(value)
val remaining = remV.value.fromProperties(m)
field[Key](resolved) :: remaining
}
}
}
// TODO: coproducts
implicit def familySVertex[T, Repr](
implicit
gen: LabelledGeneric.Aux[T, Repr],
sg: Lazy[SVertex[Repr]],
tpe: Typeable[T]
): SVertex[T] = new SVertex[T] {
def vClass: String = tpe.describe // HACK: really need a Wrapper like sjs
def toProperties(t: T): Map[String, Any] = sg.value.toProperties(gen.to(t))
def fromProperties(m: Map[String, Any]): T = gen.from(sg.value.fromProperties(m))
}
}
there's a bug in my draft code above, the id
should be null (or use your DSL for this). I thought it was like the neo4j API where this is the "type" of the node.
Yes, I'd be keen to help re any questions on the tinkerpop 2 stuff, although I do have to admit that I don't have much experience with it as I only took over this project back then. I added a few things here and there, but made some fundamental changes (complete rewrite) in v3.
Thanks for the shapeless pointers. Do you think it would be able to support value classes? E.g. for a case class MyId(value: String) extends AnyVal
I want to serialise only value
, not MyId(value)
.
autodetecting them might be tricky, but I do exactly that in this example https://github.com/fommil/spray-json-shapeless/blob/master/src/test/scala/fommil/sjs/FamilyFormatsSpec.scala#L137
and I've done something much more substantial in a corporate project (I had a sealed trait StringEnum
which was effectively a way of having classes of runtime dynamic enumerations... but I could wrap validation/logging logic around the construction of them so I could tell if new entries were being entered. e.g. case class Location(value: String) extends StringEnum
and if a new Location
was input anywhere, we would continue processing but log out an error for developer attention... and these things were serialised as String
not case classes).
btw, I'd really appreciate it if you could take a look at my ENSIME PoC branch and comment on it: https://github.com/fommil/ensime-server/tree/graph_indexer
I didn't get it finished, but hopefully you can see the direction I'm going in. I went a bit crazy with type safety, mostly as a way to give myself some boundaries. There are a lot of limitations to this implementation (especially inheritance), and I've not really thought about doing complex queries yet.
In particular, any comments on performance and our domain model would be greatly appreciated.
Looks like you really love shapeless - that's a good thing I guess :) I don't understand all the magic going on there, but it looks fine to me.
Just one note: at https://github.com/fommil/ensime-server/blob/graph_indexer/core/src/main/scala/org/ensime/indexer/orientdb/SimpleOrient.scala#L351:L355
you could also write something like graph.V.has("type", "ClassV")
and be done with it.
That would be more performant as well because a) it doesn't need to fetch all vertices, only the ones that matter b) it would use an index (which you need to create in the first place)
Yup that's what the readUnique code does :-) my shapeless use is a bit sloppy here. I'll tidy it up and ask for help most likely. I'm pretty sure the query calls can be rewritten to not need strings.
On indexes and schemas, how do I do that? Also I'd like a uniqueness constraint on vertices (fqns sold be unique per type, soi guess we have atuple constraint). In neo4j the type of a vertex is separate from its props, is that not the case here?
Also, on transactions... err are there transactions? What can be done concurrently? I'll go serial to begin with.
One more thing, do you know how to turn off printing to screen deep inside orientdb? I'm also concerned about its logging strategy having a performance impact.
Ok I see. readUnique looks fine.
I would recommend to use the orient api directly to create indexes and schemas. We've open sourced our schema migration library, maybe you can steal some ideas from there: https://github.com/springnz/orientdb-migrations TestMigrations: https://github.com/springnz/orientdb-migrations/blob/master/src/test/scala/springnz/orientdb/migration/TestMigrations.scala
And an example to create an index. The string concatenation isn't very nice, but you get the gist:
val vertexClass = findClass("V")
val userClass = getSchema.createClass("V_" + User.Label, vertexClass)
userClass.createProperty(User.Properties.Phone.value, OType.STRING)
.setMandatory(true)
.setNotNull(true)
.setReadonly(true)
userClass.createIndex(
User.Label + '.' + User.Properties.Phone.value,
INDEX_TYPE.UNIQUE,
User.Properties.Phone.value
)
Re logging: looks like Orient is using java.util.logging - should be possible to mute that. Haven't tried it though. http://tutorials.jenkov.com/java-logging/configuration.html
Hi!
I got lost, why did you drop the cross-build to 2.10? is there a way to recover that feature?
I believe the only reason was the macro. When you change the scalaVersion in the build.sbt (both occurrances) and run test:compile
that's the only thing that's erroring. So making that cross compile would be the way to go.
Shapeless would fix this. Pretty sure I'm going the use this as an example in my talk. You'll be overwhelmed with examples :-)
That's right, which is one of the reasons why using shapeless would be superior over macros. @rodrigorn we have a separate thread for that transition here: https://github.com/mpollmeier/gremlin-scala/issues/109
looks like I'm going to drop the dependency on gremlin-scala
v2 in enime :cry: basically, the mixin of Dynamic
makes it unusable because IDEs suggest anything and everything. The Java API, although ugly, is well defined. I'll probably create a minimal syntax around the GremlinPipeline
that suits my needs.
that makes sense. the really cool type safe things are really in the v3 branch, and I'm not really proud of things like the Dynamic
mixin in the v2 branch.
Let me know if you need any help re Orient or Gremlin, and have another look back once you're on java 8 :)
I have literally wasted all day trying to get neo4j to work in our build, so please bare with me as I may pass out with fatigue while writing this. I'm looking to try and do a proof of concept of this idea in ENSIME: https://github.com/ensime/ensime-server/issues/1133
I think OrientDB works on Java 6+ and (not having any scala code) will not cause any binary compatibility problems. It looks like their jar is for v2 of the tinkerpop, but I assume tinkerpop ship a v3 since they claim support. I've heard that tinkerpop might only be Java 8+ but I can't find a reference for this claim, do you know anything about that?
However, I noticed that you are not cross-building to 2.10 and 2.11 http://search.maven.org/#search%7Cga%7C1%7Cgremlin-scala is this something you could pick up again?
If you can, and all the dependencies check out, that's great, we'd be interested in using this wrapper. But for us to be able to use scala dependencies in ENSIME we have to get our dependencies on-boarded with the https://github.com/scala/community-builds Typically, that means sending a one line PR to them with instructions for the build. They'll then include building your project in their nightly test suite of the standard library / compiler / tooling. Is this something you'd be comfortable doing?
Also, I also noticed that you are using macros for your serialisation. With the latest versions of shapeless, you shouldn't need to do this anymore. I'm giving a workshop at Scala eXchange next month about how https://github.com/fommil/spray-json-shapeless is implemented without macros. It would be great if you could apply the same approach here and remove those unneeded macros / reflection dependency. I'll have plenty of material for you if that's something you're interested in.