thinkaurelius / titan

Distributed Graph Database
http://titandb.io
Apache License 2.0
5.25k stars 1.01k forks source link

Make Has() More Forgiving On Comparisons #307

Closed spmallette closed 10 years ago

spmallette commented 11 years ago

This is also an issue over in Blueprints (https://github.com/tinkerpop/blueprints/issues/409), but since Titan doesn't use the default vertex query of Blueprints, it would need to be fixed here for it to have affect in Titan (at least I think that is the case).

If a property mixes types and you try to use has() on it you can get some ugliness....in the example below (using TinkerGraph just for example...originally noticed this in Titan), I try to do greater than a long when the value in the property is a float.

gremlin> g = TinkerGraphFactory.createTinkerGraph()
==>tinkergraph[vertices:6 edges:6]
gremlin> g.E.has('weight',T.gt,1l)
java.lang.Long cannot be cast to java.lang.Float
Display stack trace? [yN] 
spmallette commented 10 years ago

Could you clarify what the fix was here? Was it just for the Float like i had in my example? Note this other "zero forgiveness" scenario that a client ran into:

gremlin> g = TitanFactory.open('/tmp/testing')
==>titangraph[local:/tmp/testing]
gremlin> g.addVertex([dateTaken:12345])
==>v[40004]
gremlin> g.V.has('dateTaken',T.gte,123)
Data type of key is not compatible with condition
Display stack trace? [yN] n
mbroecheler commented 10 years ago

This is because a dataType was never declared for dateTaken - hence Titan does not know how to compare these things. This is where the generality of Blueprints is hurting us a little because its valid to write something like:

gremlin> g.addVertex([dateTaken:12345])
==>v[40004]
gremlin> g.V.has('dateTaken',T.gte,"hello")

For Titan, it does not know how to push things down if the data type is unknown and hence it complains. I guess, we could make it such that when no data type is defined, everything is always loaded into memory and comparisons executed there (throwing exceptions if data types don't match), but that would also be incredibly expensive.

spmallette commented 10 years ago

Thanks for the explanation. This problem really only hurts two types of users:

  1. New users who are just getting started and maybe just following TinkerPop docs but using Titan
  2. Users who don't define types.

The client who ran into this problem falls into the second case and they have a reasonably compelling argument for not doing type definition (at least the most compelling i've heard thus far): they simply don't know all the types until they are created and i think there is a dynamic component to how they capture data which further prevents property type definition over time (i'm thinking there's ways to yet contend with these problems and to create proper types but that's not really the scope of this issue).

That model of and "open schema" may simply not play well with Titan. If so, we should wonder whether auto-type making makes any sense at all as a "feature" since Titan simply won't work for basic things like has, performance will stink, etc.

mbroecheler commented 10 years ago

I can figure out a way to make it work I think by simply casting things to Comparable. However, this will not be as efficient.

mbroecheler commented 10 years ago

So, after more careful investigation, I wrote a new comparator utility in Titan that correctly compares numbers. I think this issue ultimately boiled down to the fact that Long.compareTo(Integer) throws an exception which makes little sense to me, but since this is in java core, I had to replace how we handle comparisons in Titan.

Now, Titan should be less strict about exact type matches when it comes to numbers and an Object.class datatype. The only place where it will be strict is for exact index lookups using Titan's standard index. Since Titan serializes out the class name in addition to the value when the datatype is unspecified (Object.class) indexing an object with age=20 cannot be retrieved with an index call through g.V.has("age",20l). [Note the L for long in the has clause]