Closed LXWDL closed 7 years ago
@LXWDL The script print logs in the file of examples/cifar10//0.1_0.0001_0.0_0.0_0.0_2017年_05月_03日_星期三_18-57-55_CST/train.info
, you may paste the errors here to give me more details. Do you forget to create a padded lmdb for the net prototxt.
@wenwei202 Thanks for your help,it's my fault,I forget to create a padded lmdb,but I have encounter another error,when I run "make all -j16",the error is: collect2: error: ld return 1 make: [.build_release/tools/caffe.bin] error 1 .build_release/lib/libcaffe.so:对‘cusparseSdense2csc’Undefined reference .build_release/lib/libcaffe.so:对‘cusparseSetMatType’Undefined reference .build_release/lib/libcaffe.so:对‘cusparseScsrmm’Undefined reference .build_release/lib/libcaffe.so:对‘cusparseDestroyMatDescr’Undefined reference .build_release/lib/libcaffe.so:对‘cusparseDcsrmm2’Undefined reference .build_release/lib/libcaffe.so:对‘cusparseDdense2csc’Undefined reference .build_release/lib/libcaffe.so:对‘cusparseDestroy’Undefined reference .build_release/lib/libcaffe.so:对‘cusparseSetMatIndexBase’Undefined reference .build_release/lib/libcaffe.so:对‘cusparseDnnz’Undefined reference .build_release/lib/libcaffe.so:对‘cusparseCreateMatDescr’Undefined reference .build_release/lib/libcaffe.so:对‘cusparseCreate’Undefined reference .build_release/lib/libcaffe.so:对‘cusparseDcsrmm’Undefined reference .build_release/lib/libcaffe.so:对‘cusparseSnnz’Undefined reference .build_release/lib/libcaffe.so:对‘cusparseScsrmm2’Undefined reference collect2: error: ld return 1 make: [.build_release/examples/cpp_classification/classification.bin] error 1
Seems the cusparse so is not found.
Which OS are you using?
In ubuntu 16.04, it is in
/usr/local/cuda-8.0/targets/x86_64-linux/lib/libcusparse.so
, try to add /usr/local/cuda-8.0/targets/x86_64-linux/lib
to your LD_LIBRARY_PATH
.
@wenwei202 thank you very much for your help. I have solved the above problems, but I was running the following order, there is the same mistakes and the above, I do not know if I am wrong or other reasons, thanks. the command is: ./examples/cifar10/train_script.sh 0.001 0.0 0.003 0.003 0.0 0 template_group_solver.prototxt bvlc_alexnet.caffemodel the error is: ./build/tools/caffe.bin train --solver=examples/cifar10//0.001_0.0_0.003_0.003_0.0_2017年_05月_14日_星期日_08-44-43_CST/solver.prototxt --weights=examples/cifar10//bvlc_alexnet.caffemodel ./examples/cifar10/train_script.sh: 行 66: 3148 已放弃 (核心已转储) ./build/tools/caffe.bin train --solver=$solverfile --weights=$model_path/$tunedmodel > "${snapshot_path}/train.info" 2>&1
examples/cifar10//bvlc_alexnet.caffemodel
exists?
it's exists
It's weird that you are feeding cifar-10 images to alexnet trained on imagenet. Is it your special requirement? Copying logs in ${snapshot_path}/train.info
would also help.
@wenwei202 Sorry, my reply late,I was busy with other things recently.It's my understanding wrong,I have got it now, Thanks for your help.
Hi, when i run ./examples/cifar10/train_script.sh 0.1 0.0001 0.0 0.0 0.0 0 \ template_resnet_solver.prototxt encounter the following problem,what is the reason and how to solve this,if you have any sugesstion ,I will apprecriate it. thanks! ./build/tools/caffe.bin train --solver=examples/cifar10//0.1_0.0001_0.0_0.0_0.0_2017年_05月_03日_星期三_18-57-55_CST/solver.prototxt ./examples/cifar10/train_script.sh: 行 66: 14108 已放弃 (核心已转储) ./build/tools/caffe.bin train --solver=$solverfile > "${snapshot_path}/train.info" 2>&1