gorillalabs / sparkling

A Clojure library for Apache Spark: fast, fully-features, and developer friendly
https://gorillalabs.github.io/sparkling/
Eclipse Public License 1.0
448 stars 68 forks source link

Serializer issues while running on Spark remote #46

Closed MarchLiu closed 8 years ago

MarchLiu commented 8 years ago

I create some jobs use sparkling. They run fine when I run them on "local". But them always fault on any remote spark service form as "spark://...".

When I start write spark jobs used clojure, I found #18 and write the aot as proposal as https://github.com/gorillalabs/sparkling/issues/18#issuecomment-99754291 , but the error is throw:

... 16/05/04 12:32:57 INFO DAGScheduler: Job 1 failed: count at NativeMethodAccessorImpl.java:-2, took 0.440504 s

ClassNotFoundException sparkling.serialization.Registrator java.net.URLClassLoader.findClass (URLClassLoader.java:381) 16/05/04 12:32:57 INFO TaskSetManager: Lost task 0.3 in stage 1.0 (TID 27) on executor 192.168.100.11: java.io.IOException (org.apache.spark.SparkException: Failed to register classes with Kryo) [duplicate 24] 16/05/04 12:32:57 INFO TaskSetManager: Lost task 2.3 in stage 1.0 (TID 29) on executor 192.168.100.17: java.io.IOException (org.apache.spark.SparkException: Failed to register classes with Kryo) [duplicate 25] ...

The project.clj is:

(defproject operate.xcurrency.washing "0.1.0-SNAPSHOT"
  ...
  :dependencies [[org.clojure/clojure "1.8.0"]
                 [org.clojure/data.json "0.2.6"]
                 [gorillalabs/sparkling "1.2.4"]
                 [org.apache.spark/spark-core_2.10 "1.6.1"]
                 [clj-time "0.8.0"]
                 [org.clojars.marsliu/clj-parsec "0.1.0-SNAPSHOT"]
                 [com.tratao.operate.core "0.1.0-SNAPSHOT"]]
  :plugins [[cider/cider-nrepl "0.12.0-SNAPSHOT"]]
  :uberjar-merge-with {"reference.conf" [slurp str spit]}
  :source-paths ["src/main/clojure"]
  :profiles {:provided {:dependencies [[org.apache.spark/spark-core_2.10 "1.6.1"]]}}
  :aot [#".*" sparkling.serialization sparkling.destructuring]
  :main operate.xcurrency.washing
  :test-paths ["src/main/clojure" "src/test/clojure"])

and there is a private library named "com.tratao.operate.core" has a project.clj as:

(defproject com.tratao.operate.core "0.1.0-SNAPSHOT"
  ...
  :dependencies [[org.clojure/clojure "1.8.0"]
                 [org.clojure/data.json "0.2.6"]
                 [org.clojure/tools.nrepl "0.2.12"]
                 [org.clojure/tools.namespace "0.2.11"]
                 [gorillalabs/sparkling "1.2.4"]
                 [org.apache.spark/spark-core_2.10 "1.6.1"]
                 [clj-time "0.8.0"]
                 [org.clojars.marsliu/clj-parsec "0.1.0-SNAPSHOT"]]
  :plugins [[cider/cider-nrepl "0.12.0-SNAPSHOT"]]
  :source-paths ["src/main/clojure"]
  :aot [#".*" sparkling.serialization sparkling.destructuring]
  :main com.tratao.operate.server.repl
  :test-paths ["src/main/clojure" "src/test/clojure"])

If I unzip the jar and find Register class, got as :

[operate][~/DataAnalysis/jobs/xcurrent/washing]$ jar tvf target/operate.xcurrency.washing-0.1.0-SNAPSHOT-standalone.jar | grep sparkling.serialization\$register
  3027 Tue May 03 17:00:48 CST 2016 sparkling/serialization$register_base_classes.class
  2724 Tue May 03 17:00:48 CST 2016 sparkling/serialization$register.class
   996 Tue May 03 17:00:48 CST 2016 sparkling/serialization$register_class_with_serializer.class
  1289 Tue May 03 17:00:48 CST 2016 sparkling/serialization$register_optional.class
   850 Tue May 03 17:00:48 CST 2016 sparkling/serialization$register_class.class
  1276 Tue May 03 17:00:48 CST 2016 sparkling/serialization$register_clojure.class
  2615 Tue May 03 17:00:48 CST 2016 sparkling/serialization$register_scala.class
  1483 Tue May 03 17:00:48 CST 2016 sparkling/serialization$register_native_array_serializers.class
   938 Tue May 03 17:00:48 CST 2016 sparkling/serialization$register_class_with_id.class
  2297 Tue May 03 17:00:48 CST 2016 sparkling/serialization$register_java_class_serializers.class
  1974 Tue May 03 17:00:48 CST 2016 sparkling/serialization$register_spark.class
  1138 Tue May 03 17:00:48 CST 2016 sparkling/serialization$register_class_with_serializer_and_id.class
  2531 Tue May 03 17:00:48 CST 2016 sparkling/serialization$register_array_type.class
   809 Tue May 03 17:00:48 CST 2016 sparkling/serialization$register_optional$fn__2276.class
MarchLiu commented 8 years ago

I create a job which not depended any other third-party clojure code, and submit it to a remote spark master.It successed. So, I need add all third-party libraries into aot? And use repl only at local?

chrisbetz commented 8 years ago

Hi,

sorry for answering that late, but I've been to offline-land for a few days.

How do you run your remote spark jobs? Are you "uberjar"ing your project? If so, everything should be ok, but you might want to check your jar whether the classes are generated.

Pls give me information on your build/deploy process, because otherwise I'm not able to help you with that ...

Bye

Chris

Am 04.05.2016 um 10:33 schrieb Mars Liu notifications@github.com:

I create some jobs use sparkling. They run fine when I run them on "local". But them always fault on any remote spark service form as "spark://...".

When I start write spark jobs used clojure, I found #18 https://github.com/gorillalabs/sparkling/issues/18 and write the aot as proposal as #18 (comment) https://github.com/gorillalabs/sparkling/issues/18#issuecomment-99754291 , but the error is throw:

... 16/05/04 12:32:57 INFO DAGScheduler: Job 1 failed: count at NativeMethodAccessorImpl.java:-2, took 0.440504 s

ClassNotFoundException sparkling.serialization.Registrator java.net.URLClassLoader.findClass (URLClassLoader.java:381) 16/05/04 12:32:57 INFO TaskSetManager: Lost task 0.3 in stage 1.0 (TID 27) on executor 192.168.100.11: java.io.IOException (org.apache.spark.SparkException: Failed to register classes with Kryo) [duplicate 24] 16/05/04 12:32:57 INFO TaskSetManager: Lost task 2.3 in stage 1.0 (TID 29) on executor 192.168.100.17: java.io.IOException (org.apache.spark.SparkException: Failed to register classes with Kryo) [duplicate 25] ...

The project.clj is:

(defproject operate.xcurrency.washing "0.1.0-SNAPSHOT" ... :dependencies [[org.clojure/clojure "1.8.0"] [org.clojure/data.json "0.2.6"] [gorillalabs/sparkling "1.2.4"] [org.apache.spark/spark-core_2.10 "1.6.1"] [clj-time "0.8.0"] [org.clojars.marsliu/clj-parsec "0.1.0-SNAPSHOT"] [com.tratao.operate.core "0.1.0-SNAPSHOT"]] :plugins [[cider/cider-nrepl "0.12.0-SNAPSHOT"]] :uberjar-merge-with {"reference.conf" [slurp str spit]} :source-paths ["src/main/clojure"] :profiles {:provided {:dependencies [[org.apache.spark/spark-core_2.10 "1.6.1"]]}} :aot [#".*" sparkling.serialization sparkling.destructuring] :main operate.xcurrency.washing :test-paths ["src/main/clojure" "src/test/clojure"]) and there is a private library named "com.tratao.operate.core" has a project.clj as:

(defproject com.tratao.operate.core "0.1.0-SNAPSHOT" ... :dependencies [[org.clojure/clojure "1.8.0"] [org.clojure/data.json "0.2.6"] [org.clojure/tools.nrepl "0.2.12"] [org.clojure/tools.namespace "0.2.11"] [gorillalabs/sparkling "1.2.4"] [org.apache.spark/spark-core_2.10 "1.6.1"] [clj-time "0.8.0"] [org.clojars.marsliu/clj-parsec "0.1.0-SNAPSHOT"]] :plugins [[cider/cider-nrepl "0.12.0-SNAPSHOT"]] :source-paths ["src/main/clojure"] :aot [#".*" sparkling.serialization sparkling.destructuring] :main com.tratao.operate.server.repl :test-paths ["src/main/clojure" "src/test/clojure"]) — You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/gorillalabs/sparkling/issues/46

MarchLiu commented 8 years ago

Thank you for reply. Sorry for answering that late of mine. In this days I'm learning sparkling so happy. 👍 I try to move the code from "com.tratao.operate.core" into washing. It run success. May be the failed is because I need aot all clojure code if it wrotten by me and be depended?

The remote server is a spark standalone cluster support by third-party. If I try to repl a project not depended any project of mine, or uberjar it and submit , It will success. At local, it always success.