smallcorgi / Faster-RCNN_TF

Faster-RCNN in Tensorflow
MIT License
2.34k stars 1.12k forks source link

Segmentation Fault (Core Dumped) on Training (Ubuntu 16.04) #71

Open sntaus opened 7 years ago

sntaus commented 7 years ago

I set up everything following the tutorials and then tried to train it using - ./experiments/scripts/faster_rcnn_end2end.sh gpu 0 VGG16 pascal_voc, but I keep getting a Segmentation Fault (Core Dumped) every time.

Here's what I get on my Terminal when I run the command:

+ set -e
+ export PYTHONUNBUFFERED=True
+ PYTHONUNBUFFERED=True
+ DEV=gpu
+ DEV_ID=0
+ NET=VGG16
+ DATASET=pascal_voc
+ array=($@)
+ len=4
+ EXTRA_ARGS=
+ EXTRA_ARGS_SLUG=
+ case $DATASET in
+ TRAIN_IMDB=voc_2007_trainval
+ TEST_IMDB=voc_2007_test
+ PT_DIR=pascal_voc
+ ITERS=70000
++ date +%Y-%m-%d_%H-%M-%S
+ LOG=experiments/logs/faster_rcnn_end2end_VGG16_.txt.2017-02-01_20-09-35
+ exec
++ tee -a experiments/logs/faster_rcnn_end2end_VGG16_.txt.2017-02-01_20-09-35
+ echo Logging output to experiments/logs/faster_rcnn_end2end_VGG16_.txt.2017-02-01_20-09-35
Logging output to experiments/logs/faster_rcnn_end2end_VGG16_.txt.2017-02-01_20-09-35
+ python ./tools/train_net.py --device gpu --device_id 0 --weights data/pretrain_model/VGG_imagenet.npy --imdb voc_2007_trainval --iters 70000 --cfg experiments/cfgs/faster_rcnn_end2end.yml --network VGGnet_train
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
./experiments/scripts/faster_rcnn_end2end.sh: line 57:  4644 Segmentation fault      (core dumped) python ./tools/train_net.py --device ${DEV} --device_id ${DEV_ID} --weights data/pretrain_model/VGG_imagenet.npy --imdb ${TRAIN_IMDB} --iters ${ITERS} --cfg experiments/cfgs/faster_rcnn_end2end.yml --network VGGnet_train ${EXTRA_ARGS}

I got the exact same error (Segmentation Fault) when I tried to train using the official python (caffe) implementation of Faster RCNN - rbgirshick/py-faster-rcnn. It seems like I get the error at - from datasets.factory import get_imdb (tools/train_net.py Line 17). So, at this moment I couldn't get either of the two Faster RCNN implementations to train on my machine.

Please note - demo.py works fine!

Full disclosure: I'm very new to deep learning, so my apologies if this could have been caused by a stupid error on my part.

jaig commented 7 years ago

In my case, demo was not working and causing Segmentation Fault error for Tensor flow version 1.0.1. I tried the same on an older version i.e 0.12.0 and it worked there. In the search of the solution, I found this. It worked out for me. Uninstall numpy and reinstall it using the below command: pip install --no-binary=:all: numpy Hope it helps.

zlhTao2012 commented 7 years ago

@jaig Your solution works for me. Thank you!

gpfworld commented 6 years ago

@jaig I have the same problem.But when i try your solution, result is below

Requirement already satisfied: numpy in /home/gpf/Soft/anaconda3/envs/py27/lib/python2.7/site-packages (1.13.3) tensorflow 1.4.1 requires backports.weakref>=1.0rc1, which is not installed. tensorflow 1.4.1 has requirement tensorflow-tensorboard<0.5.0,>=0.4.0rc1, but you'll have tensorflow-tensorboard 0.1.5 which is incompatible.