gorillalabs / sparkling

A Clojure library for Apache Spark: fast, fully-features, and developer friendly
https://gorillalabs.github.io/sparkling/
Eclipse Public License 1.0
448 stars 68 forks source link

Added take-ordered #29

Closed erasmas closed 9 years ago

erasmas commented 9 years ago

Hi Chris,

I added take-ordered to sparkling.api. I also fiddled with additional test which accepts comparator but couldn't make it work due to serialization issues. You can see what I tried here. Hope that helps.

chrisbetz commented 9 years ago

Hi Dima,

quoting from my mail for others who follow to read:

One thing coming to my mind (aka stuff I fell over on my way to Spark) could be compilation issue: You need to make sure to have the right classes at hand, because (e.g. at the repl) clojure does recompile stuff, resulting in new classes (with the same name, but a different class). So, it might happen that you registered Class clojure.core$comparator$fn4661.class in version "A", wheras actual data contains class clojure.core$comparator$fn4661.class in version "B". I think something similar happens with (comparator … ), but with two different classes (something like clojure.core$comparator$fn4661 and clojure.core$comparator$fn2737.

Thus, two calls to (comparator …) do return two classes. Make sure to use only one comparator (aka def a var with comparator instance and use that all over the place.

It's really hard to debug that kind of stuff. You could log (.identityHashCode (class your-comparator)) while registering it and also the instance you use in your data structure to catch both types of errors - if the two logged hashCodes differ, you know you didn't register the right class.

Hope this helps

Cheers,

Chris

P.S.: Would you mind opening up pull requests against the develop branch? This would simplify things for me. Thanks for that!