yu4u / age-gender-estimation

Keras implementation of a CNN network for age and gender estimation
MIT License
1.47k stars 503 forks source link

"Illegal instruction (core dumped)" when running the program with tensorflow with gpu #5

Open galoiscch opened 7 years ago

galoiscch commented 7 years ago

I succeeded running the program with tensorflow without gpu. However, I can't run the program with tensorflow with gpu. The following error appears when I run the program:

Using TensorFlow backend. 2017-07-05 10:18:44.115782: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2017-07-05 10:18:44.116126: I tensorflow/core/common_runtime/gpu/gpu_device.cc:938] Found device 0 with properties: name: GeForce GTX 1050 major: 6 minor: 1 memoryClockRate (GHz) 1.468 pciBusID 0000:01:00.0 Total memory: 1.95GiB Free memory: 1.72GiB 2017-07-05 10:18:44.116175: I tensorflow/core/common_runtime/gpu/gpu_device.cc:959] DMA: 0 2017-07-05 10:18:44.116189: I tensorflow/core/common_runtime/gpu/gpu_device.cc:969] 0: Y 2017-07-05 10:18:44.116214: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1028] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1050, pci bus id: 0000:01:00.0) Illegal instruction (core dumped)

Does this program compatible with tensorflow with gpu? The system I am using is list as following: Ubuntu 16.04,Python 2.7.12 ,Keras 2.0.5,Tensorflow 1.2.0,CUDA 8.0, V8.0.61 ,cuDNN 6.0

galoiscch commented 7 years ago

update: I now realize that I actually didn't try this age-estimation program in this computer. I only produced a successful result in another computer with i5 cpu. The problem of this computer is that it has a very old cpu(E5200), the old cpu is not supported by dlib installed by .whl(sudo pip install dlib) The solution is as following: https://github.com/davisking/dlib/issues/620 By downloading dlib and compile it yourself, the dlib will suit your computer hardware configuration.

I downloaded dlib here: https://github.com/davisking/dlib/

Before compiling dlib, I edited dlib's tools/python/CMakeLists.txt file from:

set(USE_SSE4_INSTRUCTIONS ON CACHE BOOL "Use SSE4 instructions")

to:

set(USE_SSE2_INSTRUCTIONS ON CACHE BOOL "Use SSE2 instructions")

Then I run

python3 setup.py install

But Now, I encounter another problem. After I run the program, a window showing webcam captured image is pop out. However, when there is a human face captured by the webcam, the program crashed. The following is the error:

Using TensorFlow backend. 2017-07-06 09:35:29.039507: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2017-07-06 09:35:29.039853: I tensorflow/core/common_runtime/gpu/gpu_device.cc:938] Found device 0 with properties: name: GeForce GTX 1050 major: 6 minor: 1 memoryClockRate (GHz) 1.468 pciBusID 0000:01:00.0 Total memory: 1.95GiB Free memory: 1.71GiB 2017-07-06 09:35:29.039903: I tensorflow/core/common_runtime/gpu/gpu_device.cc:959] DMA: 0 2017-07-06 09:35:29.039917: I tensorflow/core/common_runtime/gpu/gpu_device.cc:969] 0: Y 2017-07-06 09:35:29.039941: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1028] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1050, pci bus id: 0000:01:00.0) 2017-07-06 09:35:33.377692: E tensorflow/stream_executor/cuda/cuda_dnn.cc:371] could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2017-07-06 09:35:33.377776: E tensorflow/stream_executor/cuda/cuda_dnn.cc:338] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM 2017-07-06 09:35:33.377796: F tensorflow/core/kernels/conv_ops.cc:672] Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo(), &algorithms) Aborted (core dumped)

"could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR" only appears when a human is captured. P.S. I change the program a little bit. I added a line " if len(results)>0:" before the line "predicted_genders = results[0]", so that a window will pop out even if there is no human face in it

galoiscch commented 7 years ago

Update: I suspected that the problem stem from the memory allocation method of tensorflow. Knowing that we are unable to limit the gpu's memory usage when using keras with tensorflow backend(such as gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.333)), I switch to use Keras with Theano. It works. However, the age were misjudged by a relatively large amount. The result is less desirable than the output produced using CPU(i5). Therefore, I wonder whether this program is incompatible with Theano, or it is just the problem of the insufficient computation power of my GPU(gtx 1050)

yu4u commented 7 years ago

Thank you for your useful information. Firstly, I fixed demo.py according to your comment "I added a line " if len(results)>0:".

As I did not try training the model using Theano backend, I'm not sure my program is perfectly compatible with Theano. But I think it will be. I think the problem is in using the weights obtained with TensorFlow. I'm afraid that the Theano-trained weights are not compatible with the TensorFlow one. You can convert the weights bidirectionally as explained here to solve this problem.

galoiscch commented 7 years ago
import os
from keras import backend as K
from keras.utils.conv_utils import convert_kernel
from wide_resnet import WideResNet

img_size=64
model = WideResNet(img_size, depth=16, k=8)()
model.load_weights(os.path.join("pretrained_models", "weights.18-4.06.hdf5"))

for layer in model.layers:
   if layer.__class__.__name__ in ['Convolution1D', 'Convolution2D']:
      original_w = K.get_value(layer.W)
      converted_w = convert_kernel(original_w)
      K.set_value(layer.W, converted_w)

model.save_weights(os.path.join("pretrained_models", 'weights.18-4.06_theano.h5'))

Will this python script convert the weight file correctly? I tried to use the 'weights.18-4.06_theano.h5', but the output is the same, the age predicted from most people is around 40 years old.

yu4u commented 7 years ago

The above code seems to work fine according to the instruction I referred to. But it also does not work for me... I trained the model with Theano backend so please try it: https://drive.google.com/file/d/0B_cG1nzvVZlQWGJMc2JjdzkwcVk/view?usp=sharing

galoiscch commented 7 years ago

Thank a lot

galoiscch commented 7 years ago

How much time does the training process need? I used cpu for training using wiki dataset and it can only reached the fourth epoch in one day. What is the hardware configuration of your computer?

yu4u commented 7 years ago

I trained on GPU: CPU: i7-7700 3.60GHz, GPU: GeForce GTX1080. Training requires 1-2 hours for imdb and 6 minutes for wiki.

If the problem is memory allocation, please try smaller model and smaller batch size:

python3 train.py --input data/imdb_db.mat --depth 10 --width 4 --batch_size 16

If the image size is 64, the number of parameters can also be reduced by changing

pool = AveragePooling2D(pool_size=(8, 8), strides=(1, 1), padding="same")(relu)

to

pool = AveragePooling2D(pool_size=(16, 16), strides=(1, 1), padding="same")(relu)
galoiscch commented 7 years ago

The Theano weight works well.

galoiscch commented 7 years ago

After running the command python train.py --input data/imdb_db.mat --depth 10 --width 4 --batch_size 32 ,I can run the training program with tensorflow with GPU binding. Howeven, when I test the new weight file, folloing error appears,

Using TensorFlow backend.
Traceback (most recent call last):
  File "demo.py", line 97, in <module>
    main()
  File "demo.py", line 25, in main
    model.load_weights(os.path.join("pretrained_models", "weights.15-4.02.hdf5"))
  File "/usr/local/lib/python2.7/dist-packages/Keras-2.0.5-py2.7.egg/keras/engine/topology.py", line 2572, in load_weights
    load_weights_from_hdf5_group(f, self.layers)
  File "/usr/local/lib/python2.7/dist-packages/Keras-2.0.5-py2.7.egg/keras/engine/topology.py", line 2981, in load_weights_from_hdf5_group
    str(len(filtered_layers)) + ' layers.')
ValueError: You are trying to load a weight file containing 19 layers into a model with 31 layers.

I wonder if it is due to the change I made in the command line. Thanks. I didn't change the number of parameters.

galoiscch commented 7 years ago

The size of the weight file is really smaller. Your weight file is 195.8 MB in size, while my weight file is just 63.7 MB.

yu4u commented 7 years ago

demo.py is just a demo script, which assumes to use the pre-trained model as you can see:

model = WideResNet(img_size, depth=16, k=8)()
model.load_weights(os.path.join("pretrained_models", "weights.18-4.06.hdf5"))

But I added demo.py options to identify the weight file, depth, and width parameters. Please refer to the latest version of demo.py.

yu4u commented 7 years ago

The size of the weight file is really smaller. Your weight file is 195.8 MB in size, while my weight file is just 63.7 MB.

These options --depth 10 --width 4 control the number of parameters used in the CNN, thus it is natural that the size of the weight file changes.

galoiscch commented 7 years ago

Much obliged. I can run the demo.py with my weight file now.

sbharadwajj commented 6 years ago

Hi, did you run with tensorflow backend using GPU?

galoiscch commented 6 years ago

I think I tried running the program with tensorflow backend using GPU, but it failed. It has been a long time and my memory on this project became quite rusty. I am sorry about that.

sbharadwajj commented 6 years ago

Thank you. @yu4u do you run it on GPU? Do you have any suggestions on how to fix it for Gpu?

yu4u commented 6 years ago

I did not run demo.py on a machine with GPUs but I think it works. Is there any problem?

sbharadwajj commented 6 years ago

Works perfectly with tensorflow-gpu 1.10.

nyck33 commented 5 years ago

@galoiscch

I'm running into memory issues when running this in a conda env with Pytorch GPU and Tensorflow GPU (detection done by Tencent DSFD and not dlib): https://github.com/TencentYoutuResearch/FaceDetection-DSFD

So I want to use a shallower and narrower model. Can you provide the smaller weights?