yahoo / CaffeOnSpark

Distributed deep learning on Hadoop and Spark clusters.
Apache License 2.0
1.27k stars 358 forks source link

Feature extraction mode running slow #293

Open Marcteen opened 6 years ago

Marcteen commented 6 years ago

Hi, there. I'm using CaffeOnSpark to extract deep feature(dimention is 4096) from pictures. The model I use is vgg_face,the content of solver.prototxt is

net: "VGG_FACE_deploy.prototxt" type: "Adam" test_iter: 30 test_interval: 5000 base_lr: 0.000001 momentum: 0.9 momentum2: 0.999 lr_policy: "fixed" gamma:0.8 stepsize:100000 display: 2500 max_iter: 1500000 snapshot: 5000 snapshot_prefix: "faceId-snap" solver_mode: CPU

and the spark submit command is

spark-submit --master yarn --deploy-mode cluster \ --driver-memory 3g \ --driver-cores 2 \ --num-executors 100 \ --executor-cores 1 \ --executor-memory 2g \ --files /.../adam_solver.prototxt,/.../VGG_FACE_deploy.prototxt \ --conf spark.driver.extraLibraryPath="${LD_LIBRARY_PATH}" \ --conf spark.executorEnv.LD_LIBRARY_PATH="${LD_LIBRARY_PATH}" \ --class com.yahoo.ml.caffe.CaffeOnSpark \ ${CAFFE_ON_SPARK}/caffe-grid/target/caffe-grid-0.1-SNAPSHOT-jar-with-dependencies.jar \ -features fc7 \ -clusterSize 100 \ -label label \ -conf adam_solver.prototxt \ -connection ethernet \ -model hdfs:///.../VGG_FACE.caffemodel \ -output hdfs:///.../vggFaces

When I use a sequence file consists of 3k image with 20 executors(clusterSize set to 20 of course), feature extraction terminates in 4 min. But when I process the sequence file with 450k using 100 executors, it just keep running exceeding 13 hours(no idea how long it would really takes). Since the deep conv net requires heavy cpu load, I think the time cost may be not so resonable here. Maybe I've mistaken some thing. Any help will be appreciated!!!

junshi15 commented 6 years ago

If you have access to the executors, go there and check the CPU usage etc. I suspect the job is stuck.

For feature extraction, make sure you set batch size to 1 in your prototxt file.

Marcteen commented 6 years ago

@junshi15 Thanks for respond. Yes, I do set the batch size to 1 for data in test phase. I check a host with a hung executor task, the "%CPU" column of "top" command output shows there is an java process use 1034% CPU(each worker have 12 cores). I have 13 spark slaves, And I use 120 executors each with 1 cores. Can I set more cores for executor in feature extraction mode?

Then I restart the application with only 20 executors.I check the stage detail of the spark job. I found out that the "Input Size / Records" raises fast at the beginning of the whole job, and when each task's records reach about 1k(similar with the case using 120 executors), its speed descents immediately. I notice that there is a presist(DISK_ONLY) and count() of the feature values RDD in the CaffeOnSpark.scala. Maybe there is something wrong with the persisting?

Here is a sorted info about the tasks:

image

I check a host(like hybrid14) with a task running faster(with more records number), its %CPU of "top" command is 1057 while another gives out 669(like hybrid10). I found out if there is two executors on one same host, both of them is slower compared with single one on a host. Any idea?

image

Then I reload the stage detail page continuously, I can see the number of records raises about 20+ each time, And the input size also raises slowly with about 0.3M. Is that normal? The whole input file should be 4.1GB. image

junshi15 commented 6 years ago

I don't know where the problem is. I only use Yarn mode, which sets spark.executor.cores to 1. So one core per executor. I am not sure what will happen if you have more than one core per executor.

Marcteen commented 6 years ago

@junshi15, I also use Yarn mode. I notice when I use 20 executors, the images sequenceFile is split into 30 task. So some executors has more than one task to run. And I am sure that each time a executor takes a new task(rdd partition), the first 1k+ records always finish in a few minutes, then it turn into a slow pattern. Have you ever try the feature mode? I think if I split the data into pieces and submit several spark application, it would be faster than processing them in one.

junshi15 commented 6 years ago

It has been a while since I used CaffeOnSpark. I did not remember any problem with "features" mode. The synchronicity between executors are not required in this mode, so it is OK for some executors to have more partitions.

umeshnet88 commented 6 years ago

k U .v Netke you on my facebook id umesh vijay netake on share button On Tuesday 19 December 2017, 1:32:56 PM IST, Jun Shi notifications@github.com wrote:

It has been a while since I used CaffeOnSpark. I did not remember any problem with "features" mode. The synchronicity between executors are not required in this mode, so it is OK for some executors to have more partitions.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

umeshnet88 commented 6 years ago

you on my facebook id umesh vijay netake on share button to link

U .v Netke

k U .v Netke you on my facebook id umesh vijay netake on share button On Tuesday 19 December 2017, 1:32:56 PM IST, Jun Shi notifications@github.com wrote:

It has been a while since I used CaffeOnSpark. I did not remember any problem with "features" mode. The synchronicity between executors are not required in this mode, so it is OK for some executors to have more partitions.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.