yahoo / CaffeOnSpark

Distributed deep learning on Hadoop and Spark clusters.
Apache License 2.0
1.27k stars 357 forks source link

Do all the nodes of hadoop yarn cluster need to install CaffeOnSpark ? #241

Open guyang88 opened 7 years ago

guyang88 commented 7 years ago

@mriduljain @davglass @gyehuda @javadba @pcnudde I want to run CaffeOnSpark on Spark-on-Yarn Cluster, do i need to install CaffeOnSpark on every node, or just install on one of them?

junshi15 commented 7 years ago

You can either install it on each node, or install it on the node where you launch your job and use spark-submit to ship the whole package to executors. In either case, path needs to be set up properly.

guyang88 commented 7 years ago

@junshi15 I installed CaffeOnSpark on one node and set the path per GetStarted_yarn , when I used spark-sbumit to launch caffeonspark ,I met mistake:"no lmdbjni in java path". But I installed caffeonspark on all node , I succeed. so what is the problem?(I have set LD_LIBRARY_PATH before spark-submit)

anfeng commented 7 years ago

You should create a tgz file, say cos.tgz, with lib64/liblmdbjni.so etc, and specify that tgz file as --archive and extend executor's LD_LIBRARY_PATH to include ":cos.tgz/lib64".

tar -cpzf ${HOME}/tmp/cos.tgz lib64

spark-submit .... --archives ${HOME}/tmp/caffe_on_grid_archive.tgz \ --conf spark.driver.extraLibraryPath="WHATEVRYOUHAVE:./cos.tgz/lib64" \ --conf spark.executorEnv.LD_LIBRARY_PATH="WHATEVRYOUHAVE:./cos.tgz/lib64" \

Andy

On Tue, Apr 4, 2017 at 7:18 PM, guyang88 notifications@github.com wrote:

I installed CaffeOnSpark on one node and set the path per GetStarted_yarn , when I used spark-sbumit to launch caffeonspark ,I met mistake:"no lmdbjni in java path". But I installed caffeonspark on all node , I succeed. so what is the problem?(I have set LD_LIBRARY_PATH before spark-submit)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/yahoo/CaffeOnSpark/issues/241#issuecomment-291718908, or mute the thread https://github.com/notifications/unsubscribe-auth/AClTeEKysQCqara09xgAXZKt1JfvS8Qmks5rsvnogaJpZM4MvLza .

guyang88 commented 7 years ago

@anfeng Thangks for your answer . I created a tgz file including CaffeOnSpark and all the lib*.so which are requierd, then I used spark-sbumit --archives / path to my tgz file . But I met the same error :"no lmdbjni in java path" .How can I make the tgz file shared by other nodes ? Do I create a fake tgz file?

junshi15 commented 7 years ago

The .tgz file will be shipped to all executors. Make sure 1) your .tgz file contains lmdbjni.so 2) set the library path as shown by @anfeng