mpollmeier / gremlin-scala

[unmaintained] Scala wrapper for Apache TinkerPop 3 Graph DSL
Apache License 2.0
482 stars 76 forks source link

scala / java versions #102

Closed fommil closed 8 years ago

fommil commented 8 years ago

I have literally wasted all day trying to get neo4j to work in our build, so please bare with me as I may pass out with fatigue while writing this. I'm looking to try and do a proof of concept of this idea in ENSIME: https://github.com/ensime/ensime-server/issues/1133

I think OrientDB works on Java 6+ and (not having any scala code) will not cause any binary compatibility problems. It looks like their jar is for v2 of the tinkerpop, but I assume tinkerpop ship a v3 since they claim support. I've heard that tinkerpop might only be Java 8+ but I can't find a reference for this claim, do you know anything about that?

However, I noticed that you are not cross-building to 2.10 and 2.11 http://search.maven.org/#search%7Cga%7C1%7Cgremlin-scala is this something you could pick up again?

If you can, and all the dependencies check out, that's great, we'd be interested in using this wrapper. But for us to be able to use scala dependencies in ENSIME we have to get our dependencies on-boarded with the https://github.com/scala/community-builds Typically, that means sending a one line PR to them with instructions for the build. They'll then include building your project in their nightly test suite of the standard library / compiler / tooling. Is this something you'd be comfortable doing?

Also, I also noticed that you are using macros for your serialisation. With the latest versions of shapeless, you shouldn't need to do this anymore. I'm giving a workshop at Scala eXchange next month about how https://github.com/fommil/spray-json-shapeless is implemented without macros. It would be great if you could apply the same approach here and remove those unneeded macros / reflection dependency. I'll have plenty of material for you if that's something you're interested in.

fommil commented 8 years ago

I'm just spotting that you have a 2.x branch and 2.6.1 is cross built, nice. That could be interesting, I may take a look after trying things out in the raw Java API first. But with regards to the community-builds above, I think we'd still need some help from you to get that on-boarded with them if we decided to use this API. https://github.com/mpollmeier/gremlin-scala/tree/2.x

mpollmeier commented 8 years ago

Sounds like a great idea, and very exciting for me as a happy ensime user :) I will provide more detailed answers shortly, for now just a few pointers for your understanding:

Sample repos including setup for neo4j and orientdb https://github.com/mpollmeier/gremlin-scala-examples

OrientDB driver for Tinkerpop3: I started this one as a proof of concept, but by now we have 5 contributors and it is really active. We use it in a few apps internally. It's used in the examples repo. https://github.com/mpollmeier/orientdb-gremlin

We had a shapeless implementation for serialisation, but the macro had more functionality (both were contributions). Happy to look at using a plain shapeless implementation again. https://github.com/mpollmeier/gremlin-scala/commit/31ef226

We used to cross-build 2.10 and 2.11 until recently, I think we dropped it because of the macro - not sure though, will have a look tomorrow.

I would not recommend to use the 2.x branch for anything new, Tinkerpop3 is the future and all the nifty type safe features are only in the 3.x (master) version.

fommil commented 8 years ago

great! :smile:

I would need to be sure that the v3 tinkerpop stuff works on Java 6 and isn't going to pull in any scala binary version headaches. Is that something you'd be able to check out? I'm all out of steam today :weary:

We'd be happy to use your orientdb v3 impl when you think its strong enough. It is also pure Java I see, which simplifies things (and Java 6 compatible?).

fommil commented 8 years ago

and if we do end up using this, I'll probably do the shapeless magic bit. Possibly even set it as an advanced exercise for the workshop :smiling_imp:

fommil commented 8 years ago

just checked, there is a bunch of Java 8 stuff when I use v3.x... so we can't use it :cry:

mpollmeier commented 8 years ago

My excitement made me conveniently overread your notes re java 6 but I just realised the same. We could probably backport most of the features of gremlin-scala 3.x to gremlin-scala 2.x, but:

https://twitter.com/apachetinkerpop/status/658472326544101376

csndc1txaaefgm_

mpollmeier commented 8 years ago

As you already mentioned in your ticket you might run into scala version issues with neo4j. Until a few months ago they only supported 2.10, and I believe now it's only 2.11. https://github.com/mpollmeier/gremlin-scala-examples/issues/4 https://github.com/neo4j/neo4j/issues/2881

Is the main reason to use java 6 that some enterprises are still on java 6 as mentioned in https://github.com/ensime/ensime-server/issues/1118 ? It's long outdated and not supported any more - the IE6 of JVMs :) Even Java 7 is end of life: http://www.oracle.com/technetwork/java/eol-135779.html

fommil commented 8 years ago

Yeah Java 6 is still the official JDK for Scala 2.10 and 2.11 and a quick poll on twitter suggests that half of ENSIME users still have to support Java 6 systems. It is also the JDK used by the entire build system and when scalajs tried to drop support they caused a riot and had to revert their decision.

I'm aware that gremlin 2 is no longer active but given that its our only choice, I prefer to read that an "stable" :-)

I don't want to cause any extra work for you but if you're keen to help it would be really appreciated!

fommil commented 8 years ago

But lets wait until I've done the PoC first. It might turn out that graph databases just aren't fast enough.

fommil commented 8 years ago

FYI and a bit off topic on the above, but this is the sort of thing I'd be doing with shapeless. It's not too much work to extend this to handle sealed traits, and various forms of customisation a la spray-json-shapeless, but I only have very simple requirements.

What this doesn't do is give any form of fancy syntax for the query filtering. I suspect Records are more appropriate there.

package marshalling {
  // TODO: error handling with Either[String, Map[String, Any]]
  trait SVertex[T] {
    def vClass: String
    def toProperties(t: T): Map[String, Any]
    def fromProperties(m: Map[String, Any]): T
  }

  trait SPrimitive[T] {
    def toValue(v: T): Any
    def fromValue(v: Any): T
  }

  object SPrimitive {
    implicit object StringSPrimitive extends SPrimitive[String] {
      def toValue(v: String): Any = v
      def fromValue(v: Any): String = v.asInstanceOf[String]
    }
  }
}

package object syntax {
  import marshalling._

  /** Creates a context to communicate with the Graph. */
  def withGraph[T](f: Graph => T)(implicit factory: OrientGraphFactory): T = {
    val graph: Graph = factory.getTx()
    try f(graph)
    finally graph.shutdown()
  }

  /** Syntactic helper for serialisables. */
  implicit class RichSerialisable[T](val t: T) extends AnyVal {
    def vClass(implicit s: SVertex[T]): String = s.vClass
    def toProperties(implicit s: SVertex[T]): Map[String, Any] = s.toProperties(t)
    def fromProperties(props: Map[String, Any])(implicit s: SVertex[T]): T = s.fromProperties(props)
  }

  /** Glues the serialisable typeclass implementations to the Graph API. */
  implicit class RichGraph(val graph: Graph) extends AnyVal {
    /** Side-effecting vertex insertion. */
    def insertV[T: SVertex](t: T): Vertex = {
      val props = t.toProperties
      val v = graph.addVertex(t.vClass)
      props.foreach {
        case (key, value) => v.setProperty(key, value)
      }
      v
    }
  }
}

package object impl {
  import marshalling._
  import syntax._
  import shapeless._, labelled.{ field, FieldType }, syntax.singleton._

  implicit def hNilSVertex[T]: SVertex[HNil] = new SVertex[HNil] {
    def vClass: String = ???
    def toProperties(t: HNil): Map[String, Any] = Map.empty
    def fromProperties(m: Map[String, Any]): HNil = HNil
  }

  implicit def hListSVertex[Key <: Symbol, Value, Remaining <: HList](
    implicit
    key: Witness.Aux[Key],
    prim: SPrimitive[Value],
    remV: Lazy[SVertex[Remaining]]
  ): SVertex[FieldType[Key, Value] :: Remaining] =
    new SVertex[FieldType[Key, Value] :: Remaining] {
      def vClass: String = ???

      def toProperties(t: FieldType[Key, Value] :: Remaining): Map[String, Any] = {
        val value = prim.toValue(t.head)
        remV.value.toProperties(t.tail) + (key.value.name -> value)
      }

      def fromProperties(m: Map[String, Any]): FieldType[Key, Value] :: Remaining = {
        m.get(key.value.name) match {
          case None => throw new IllegalArgumentException(s"missing key ${key.value.name}")
          case Some(value) =>
            val resolved = prim.fromValue(value)
            val remaining = remV.value.fromProperties(m)
            field[Key](resolved) :: remaining
        }
      }
    }

  // TODO: coproducts

  implicit def familySVertex[T, Repr](
    implicit
    gen: LabelledGeneric.Aux[T, Repr],
    sg: Lazy[SVertex[Repr]],
    tpe: Typeable[T]
  ): SVertex[T] = new SVertex[T] {
    def vClass: String = tpe.describe // HACK: really need a Wrapper like sjs
    def toProperties(t: T): Map[String, Any] = sg.value.toProperties(gen.to(t))
    def fromProperties(m: Map[String, Any]): T = gen.from(sg.value.fromProperties(m))
  }
}
fommil commented 8 years ago

there's a bug in my draft code above, the id should be null (or use your DSL for this). I thought it was like the neo4j API where this is the "type" of the node.

mpollmeier commented 8 years ago

Yes, I'd be keen to help re any questions on the tinkerpop 2 stuff, although I do have to admit that I don't have much experience with it as I only took over this project back then. I added a few things here and there, but made some fundamental changes (complete rewrite) in v3.

mpollmeier commented 8 years ago

Thanks for the shapeless pointers. Do you think it would be able to support value classes? E.g. for a case class MyId(value: String) extends AnyVal I want to serialise only value, not MyId(value).

fommil commented 8 years ago

autodetecting them might be tricky, but I do exactly that in this example https://github.com/fommil/spray-json-shapeless/blob/master/src/test/scala/fommil/sjs/FamilyFormatsSpec.scala#L137

and I've done something much more substantial in a corporate project (I had a sealed trait StringEnum which was effectively a way of having classes of runtime dynamic enumerations... but I could wrap validation/logging logic around the construction of them so I could tell if new entries were being entered. e.g. case class Location(value: String) extends StringEnum and if a new Location was input anywhere, we would continue processing but log out an error for developer attention... and these things were serialised as String not case classes).

fommil commented 8 years ago

btw, I'd really appreciate it if you could take a look at my ENSIME PoC branch and comment on it: https://github.com/fommil/ensime-server/tree/graph_indexer

I didn't get it finished, but hopefully you can see the direction I'm going in. I went a bit crazy with type safety, mostly as a way to give myself some boundaries. There are a lot of limitations to this implementation (especially inheritance), and I've not really thought about doing complex queries yet.

In particular, any comments on performance and our domain model would be greatly appreciated.

mpollmeier commented 8 years ago

Looks like you really love shapeless - that's a good thing I guess :) I don't understand all the magic going on there, but it looks fine to me.

Just one note: at https://github.com/fommil/ensime-server/blob/graph_indexer/core/src/main/scala/org/ensime/indexer/orientdb/SimpleOrient.scala#L351:L355 you could also write something like graph.V.has("type", "ClassV") and be done with it.

That would be more performant as well because a) it doesn't need to fetch all vertices, only the ones that matter b) it would use an index (which you need to create in the first place)

fommil commented 8 years ago

Yup that's what the readUnique code does :-) my shapeless use is a bit sloppy here. I'll tidy it up and ask for help most likely. I'm pretty sure the query calls can be rewritten to not need strings.

On indexes and schemas, how do I do that? Also I'd like a uniqueness constraint on vertices (fqns sold be unique per type, soi guess we have atuple constraint). In neo4j the type of a vertex is separate from its props, is that not the case here?

Also, on transactions... err are there transactions? What can be done concurrently? I'll go serial to begin with.

fommil commented 8 years ago

One more thing, do you know how to turn off printing to screen deep inside orientdb? I'm also concerned about its logging strategy having a performance impact.

mpollmeier commented 8 years ago

Ok I see. readUnique looks fine.

I would recommend to use the orient api directly to create indexes and schemas. We've open sourced our schema migration library, maybe you can steal some ideas from there: https://github.com/springnz/orientdb-migrations TestMigrations: https://github.com/springnz/orientdb-migrations/blob/master/src/test/scala/springnz/orientdb/migration/TestMigrations.scala

And an example to create an index. The string concatenation isn't very nice, but you get the gist:

      val vertexClass = findClass("V")
      val userClass = getSchema.createClass("V_" + User.Label, vertexClass)
      userClass.createProperty(User.Properties.Phone.value, OType.STRING)
        .setMandatory(true)
        .setNotNull(true)
        .setReadonly(true)

      userClass.createIndex(
        User.Label + '.' + User.Properties.Phone.value,
        INDEX_TYPE.UNIQUE,
        User.Properties.Phone.value
      )

Re logging: looks like Orient is using java.util.logging - should be possible to mute that. Haven't tried it though. http://tutorials.jenkov.com/java-logging/configuration.html

rodrigorn commented 8 years ago

Hi!

I got lost, why did you drop the cross-build to 2.10? is there a way to recover that feature?

mpollmeier commented 8 years ago

I believe the only reason was the macro. When you change the scalaVersion in the build.sbt (both occurrances) and run test:compile that's the only thing that's erroring. So making that cross compile would be the way to go.

fommil commented 8 years ago

Shapeless would fix this. Pretty sure I'm going the use this as an example in my talk. You'll be overwhelmed with examples :-)

mpollmeier commented 8 years ago

That's right, which is one of the reasons why using shapeless would be superior over macros. @rodrigorn we have a separate thread for that transition here: https://github.com/mpollmeier/gremlin-scala/issues/109

fommil commented 8 years ago

looks like I'm going to drop the dependency on gremlin-scala v2 in enime :cry: basically, the mixin of Dynamic makes it unusable because IDEs suggest anything and everything. The Java API, although ugly, is well defined. I'll probably create a minimal syntax around the GremlinPipeline that suits my needs.

mpollmeier commented 8 years ago

that makes sense. the really cool type safe things are really in the v3 branch, and I'm not really proud of things like the Dynamic mixin in the v2 branch. Let me know if you need any help re Orient or Gremlin, and have another look back once you're on java 8 :)