nathanmarz / cascalog

Data processing on Hadoop without the hassle.
Other
1.38k stars 178 forks source link

Queries with Clojure Records #239

Closed kul closed 9 years ago

kul commented 10 years ago

There seems to be a problem with queries if clojure records are present in tuples

user=> (use 'cascalog.api)
nil
user=> (defrecord MyRec [a b])
user.MyRec
user=> (??<- [?r] ([(MyRec. 1 2)] ?r))
UnsupportedOperationException   user.MyRec (form-init9201996833299850058.clj:1)
user=> (??<- [?r] ([[(MyRec. 1 2)]] ?r))
UnsupportedOperationException   user.MyRec (form-init4990014382799884351.clj:1)
sorenmacbeth commented 10 years ago

hi kul,

could you add a full stracktrace? you can get this my calling (pst) in your repl directly after trying the query that fails. thanks.

kul commented 10 years ago
user=> (pst)
UnsupportedOperationException 
        user.MyRec (form-init4990014382799884351.clj:1)
        com.esotericsoftware.kryo.serializers.MapSerializer.read (MapSerializer.java:137)
        com.esotericsoftware.kryo.serializers.MapSerializer.read (MapSerializer.java:17)
        com.esotericsoftware.kryo.Kryo.readObject (Kryo.java:612)
        cascading.kryo.KryoDeserializer.deserialize (KryoDeserializer.java:37)
        cascading.tuple.hadoop.TupleSerialization$SerializationElementReader.read (TupleSerialization.java:628)
        cascading.tuple.hadoop.io.HadoopTupleInputStream.readType (HadoopTupleInputStream.java:105)
        cascading.tuple.hadoop.io.HadoopTupleInputStream.getNextElement (HadoopTupleInputStream.java:52)
        cascading.tuple.io.TupleInputStream.readTuple (TupleInputStream.java:78)
        cascading.tuple.io.TupleInputStream.readTuple (TupleInputStream.java:67)
        cascading.tuple.hadoop.io.TupleDeserializer.deserialize (TupleDeserializer.java:38)
        cascading.tuple.hadoop.io.TupleDeserializer.deserialize (TupleDeserializer.java:28)

Great! didnt know about pst

sorenmacbeth commented 10 years ago

thanks.

I recall this now. carbonite, a library which allows for clojure types to be serialized with kryo, has a bug where records cannot be serialized. So, this is actually a bug in carbonite and not cascalog itself.

I will look into fixing the carbonite bug.

kul commented 10 years ago

That great news (in the sense that cascalog doesnt need to be patched)!

Thanks

sorenmacbeth commented 10 years ago

@kul kryo cannot serialize clojure records in a generic manner since records are concrete types in java.

So, your options are: 1) write kryo serializers for your record types and register them with hadoop/cascalog 2) preprocess your records into maps using the builtin map->MyRecord fns and just pass maps around inside cascalog.

sritchie commented 9 years ago

Closing this one.