Segmentation fault when running ./train/train_model.sh

musicrainie commented 7 years ago

When running ./train/train_model.sh, I got the following error:

I0418 09:54:54.054507 20872 layer_factory.hpp:77] Creating layer data I0418 09:54:54.054597 20872 db_lmdb.cpp:35] Opened lmdb ./caffe-colorization/examples/imagenet/ilsvrc12_train_lmdb I0418 09:54:54.054622 20872 net.cpp:84] Creating Layer data I0418 09:54:54.054630 20872 net.cpp:380] data -> data I0418 09:54:54.055681 20872 data_layer.cpp:45] output data size: 40,3,176,176 I0418 09:54:54.076776 20872 net.cpp:122] Setting up data I0418 09:54:54.076802 20872 net.cpp:129] Top shape: 40 3 176 176 (3717120) I0418 09:54:54.076805 20872 net.cpp:137] Memory required for data: 14868480 I0418 09:54:54.076812 20872 layer_factory.hpp:77] Creating layer img_lab Aborted at 1492480494 (unix time) try "date -d @1492480494" if you are using GNU date PC: @ 0x7f3516e6e873 std::_Hashtable<>::clear() SIGSEGV (@0x9) received by PID 20872 (TID 0x7f352eb42740) from PID 9; stack trace: @ 0x7f352bd584b0 (unknown) @ 0x7f3516e6e873 std::_Hashtable<>::clear() @ 0x7f3516e60346 google::protobuf::DescriptorPool::FindFileByName() @ 0x7f3516e3eac8 google::protobuf::python::cdescriptor_pool::AddSerializedFile() @ 0x7f352c3c17d0 PyEval_EvalFrameEx @ 0x7f352c4ea01c PyEval_EvalCodeEx @ 0x7f352c4403dd (unknown) @ 0x7f352c4131e3 PyObject_Call @ 0x7f352c433ae5 (unknown) @ 0x7f352c3ca123 (unknown) @ 0x7f352c4131e3 PyObject_Call @ 0x7f352c3be13c PyEval_EvalFrameEx @ 0x7f352c4ea01c PyEval_EvalCodeEx @ 0x7f352c3b8b89 PyEval_EvalCode @ 0x7f352c44d1b4 PyImport_ExecCodeModuleEx @ 0x7f352c44db8f (unknown) @ 0x7f352c44f300 (unknown) @ 0x7f352c44f5c8 (unknown) @ 0x7f352c4506db PyImport_ImportModuleLevel @ 0x7f352c3c7698 (unknown) @ 0x7f352c4131e3 PyObject_Call @ 0x7f352c4e9447 PyEval_CallObjectWithKeywords @ 0x7f352c3bc5c6 PyEval_EvalFrameEx @ 0x7f352c4ea01c PyEval_EvalCodeEx @ 0x7f352c3b8b89 PyEval_EvalCode @ 0x7f352c44d1b4 PyImport_ExecCodeModuleEx @ 0x7f352c44db8f (unknown) @ 0x7f352c44f300 (unknown) @ 0x7f352c44f5c8 (unknown) @ 0x7f352c4506db PyImport_ImportModuleLevel @ 0x7f352c3c7698 (unknown) @ 0x7f352c4131e3 PyObject_Call ./train/train_model.sh: line 2: 20872 Segmentation fault ./caffe-colorization/build/tools/caffe train -solver ./train/solver.prototxt -weights ./models/init_v2.caffemodel -gpu $1

My GPU is GTX1070-8G, memory is 32G and system is Ubuntu16.04. The ImageNet-lmdb-file was created with ./caffe-colorization/examples/imagenet/create_imagenet.sh without resizing, resized to 256x256 and resized to 176x176, but all three cases came to the same error. I think maybe it relates to protobuf, so I tried protobuf-v3.2.0, protobuf-v3.2.0-rc.1, protobuf-v3.2.0rc2 and protobuf-v3.2.1, but again all cases came to the same error. With the lastest caffe-1.0, the error is same. Can anyone help me about this "Segmentation fault" issue? Thanks appreciately!

phongdinhv commented 7 years ago

The default sh file will cause errors, try to train the model by this command

./caffe train -solver -weights -model

musicrainie commented 7 years ago

Thanks for your comment, @Leo2610. I just tried "$ sudo ./caffe-colorization/build/tools/caffe train -solver ./train/solver.prototxt -weights ./models/init_v2.caffemodel -gpu 0", but nothing changed. The caffe above was version 1.0, and with caffe_private_pascal I got the same error.

phongdinhv commented 7 years ago

The problem is your protobuf, try to install version 3.1.0

Here are my specs, can handle maximum 55 of mini batch size:

GTX 1080 8Gb Memory: 8Gb Ubuntu 16.04 Protobuf 3.1.0 CUDA 8.0 and cuDNN 5.1 LMDB with crop-size 256x256

musicrainie commented 7 years ago

Version 3.1.0 works. Thank you, @Leo2610.

richzhang / colorization

Segmentation fault when running ./train/train_model.sh #27