gorillalabs / sparkling

A Clojure library for Apache Spark: fast, fully-features, and developer friendly
https://gorillalabs.github.io/sparkling/
Eclipse Public License 1.0
448 stars 68 forks source link

Is there going to be a version that supports Spark 2.0 soon #51

Closed draven72 closed 7 years ago

draven72 commented 8 years ago

I've tried to run one of my existing jobs using the latest version Spark and it failed with:

java.lang.AbstractMethodError: sparkling.function.FlatMapFunction.call(Ljava/lang/Object;)Ljava/util/Iterator; at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$1$1.apply(JavaRDDLike.scala:124) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$1$1.apply(JavaRDDLike.scala:124) at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434) at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:461) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:461) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47) at org.apache.spark.scheduler.Task.run(Task.scala:85) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)

Looking at the release notes from Apache, this could be due to this behaviour change:

•Java RDD’s flatMap and mapPartitions functions used to require functions returning Java Iterable. They have been updated to require functions returning Java iterator so the functions do not need to materialize all the data.

jbrownlucid commented 7 years ago

@draven, I also want to use spark 2.0 with sparkling - did you try forking the library and updating the dependencies?

jbrownlucid commented 7 years ago

Hi, @sorenmacbeth @blak3mill3r @chrisbetz (not sure who the primary maintainers are)

I'm planning to go ahead and make the necessary changes to work with spark 2.x. Is there anything I should know about setting up to run the tests and such? I may have some other questions as I get into the work.

blak3mill3r commented 7 years ago

IIRC there's nothing unusual about the test suite, it'll start a local spark context and test against it.

chrisbetz commented 7 years ago

There's a PR for that (https://github.com/gorillalabs/sparkling/pull/52) which I merged and I will make a new release with all the changes in February.

MafcoCinco commented 7 years ago

Just wanted to check in on new version of Sparkling. Looks like support for Spark 2.x has been merged in as well as support for SparkSQL. You mentioned cutting the new release in February and I was wondering if it would be coming soon? Sparkling is awesome and I would like to use it at my company for some upcoming projects. However, would really need Spark 2.x and SparkSQL to help make the case! Thanks, and keep up the great work.

chrisbetz commented 7 years ago

Hey, I pushed it :) Sorry, I had some trouble on my machine getting the tests up and running (which did not reflect on travis, as I checked this morning). So, on another machine and on travis it is ok and here it is: sparkling-2.0.0, together with sparkling-getting-started.

Sorry it took me so long, I really appreciate your contributions and comments and I feel really sorry for letting you down for so long!

MafcoCinco commented 7 years ago

Many thanks!