Closed draven72 closed 7 years ago
@draven, I also want to use spark 2.0 with sparkling - did you try forking the library and updating the dependencies?
Hi, @sorenmacbeth @blak3mill3r @chrisbetz (not sure who the primary maintainers are)
I'm planning to go ahead and make the necessary changes to work with spark 2.x. Is there anything I should know about setting up to run the tests and such? I may have some other questions as I get into the work.
IIRC there's nothing unusual about the test suite, it'll start a local spark context and test against it.
There's a PR for that (https://github.com/gorillalabs/sparkling/pull/52) which I merged and I will make a new release with all the changes in February.
Just wanted to check in on new version of Sparkling. Looks like support for Spark 2.x has been merged in as well as support for SparkSQL. You mentioned cutting the new release in February and I was wondering if it would be coming soon? Sparkling is awesome and I would like to use it at my company for some upcoming projects. However, would really need Spark 2.x and SparkSQL to help make the case! Thanks, and keep up the great work.
Hey, I pushed it :) Sorry, I had some trouble on my machine getting the tests up and running (which did not reflect on travis, as I checked this morning). So, on another machine and on travis it is ok and here it is: sparkling-2.0.0, together with sparkling-getting-started.
Sorry it took me so long, I really appreciate your contributions and comments and I feel really sorry for letting you down for so long!
Many thanks!
I've tried to run one of my existing jobs using the latest version Spark and it failed with:
java.lang.AbstractMethodError: sparkling.function.FlatMapFunction.call(Ljava/lang/Object;)Ljava/util/Iterator; at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$1$1.apply(JavaRDDLike.scala:124) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$1$1.apply(JavaRDDLike.scala:124) at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434) at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:461) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:461) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47) at org.apache.spark.scheduler.Task.run(Task.scala:85) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
Looking at the release notes from Apache, this could be due to this behaviour change:
•Java RDD’s flatMap and mapPartitions functions used to require functions returning Java Iterable. They have been updated to require functions returning Java iterator so the functions do not need to materialize all the data.