Closed cutybug closed 7 years ago
Hi, was not able to find much about it either... :/ But few questions:
dpkg --get-selections | grep libjpeg
if you are linking against the system one, otherwise you can see which one is being used with ldd ./matlab/mex/vl_imreadjpeg.mexa64 | grep libjpeg
).This may help to track the issue... I hope :P
Thanks for providing some directions.
It seems that I have multiple instances of libjpeg.so in my system. I think I'll try other files. Do you know how I can point to a specific lib file in vl_compilenn?
Any news about this issue? I have been also trying to train alexnet on imagenet on a GPU (Tesla K40). I got the same errors as cutybug right after the first training iteration (around train: epoch 01: 250/5005:
) using the current version of matconvnet (beta 22). As I am having problems with the compilation/installation , I compiled the library using GPU in the most simple way: vl_compilenn('enableGpu', true)
. Some images for which it does not work are:
/scratch/imagenet/images/train/n01797886/n01797886_9854.JPEG: error 'libjpeg: Improper call to JPEG library in state 202'
/scratch/imagenet/images/train/n02125311/n02125311_16634.JPEG: error 'libjpeg: Improper call to JPEG library in state 202'
/scratch/imagenet/images/train/n02133161/n02133161_3988.JPEG: error 'libjpeg: Improper call to JPEG library in state 202'
/scratch/imagenet/images/train/n09246464/n09246464_23298.JPEG: error 'libjpeg: Improper call to JPEG library in state 202'
/scratch/imagenet/images/train/n02105641/n02105641_12238.JPEG: error 'libjpeg: Improper call to JPEG library in state 202'
/scratch/imagenet/images/train/n02027492/n02027492_2535.JPEG: error 'libjpeg: Improper call to JPEG library in state 202'
/scratch/imagenet/images/train/n01667778/n01667778_21162.JPEG: error 'libjpeg: Improper call to JPEG library in state 202'
/scratch/imagenet/images/train/n01608432/n01608432_11728.JPEG: error 'libjpeg: Improper call to JPEG library in state 202'
It is worth to mention that the error message is not shown for all images. Also when training alexnet on imagenet for older versions of matconvnet everything went fine. I am using:
Linux System: CentOS Linux release 7.2.1511
GCC: version 4.8.5 20150623
Matlab: R2015b
Cuda: 7.5
cuDNN: v4
$ ldconfig -p | grep libjpeg
libjpeg.so.62 (libc6,x86-64) => /lib64/libjpeg.so.62
libjpeg.so.62 (libc6) => /lib/libjpeg.so.62
libjpeg.so (libc6,x86-64) => /lib64/libjpeg.so
libjpeg.so (libc6) => /lib/libjpeg.so
I decided to train the Alexnet on Imagenet using the provided example from matconvnet being aware of the errors mentioned above regarding the libjpeg
. As info, below the graph I have obtained after 20 iterations
(objective: 6.910 top1err: 0.999 top5err: 0.995
). One can clearly see that the training is not working as it should. The reason is mainly because the images couldn't be read. Any ideas on how to solve the libjpeg
problem are really welcomed! I hope this is not a major issue and other models can be properly trained.
Below the training results of Alexnet on Imagenet using 20 iterations
and a previous version of matconvnet.
We really cannot reproduce this issue, even though we run it at almost exactly the same configuration.
Just one question - the mount /scratch/
- is it a local storage, or some more fancy file-system? Maybe there are issues with that with the new implementation...
Also, does the same happen when you pre-process the images to constant size (with utils/preprocess-imagenet.sh
)?
@lenck Thanks for your reply. '/scratch' is a simple local storage, nothing fancy. I downloaded the toolbox again and reinstalled/compiled everything, but it didn't work:
make ARCH=glnxa64 MATLABROOT=/usr/local/MATLAB/R2015b/ ENABLE_GPU=yes CUDAROOT=/usr/local/cuda-7.5/ CUDAMETHOD=nvcc ENABLE_CUDNN=yes CUDNNROOT=/opt/cuDNN-v5.1/ ENABLE_IMREADJPEG=yes LIBJPEG_INCLUDE=/usr/include/ LIBJPEG_LIB=/usr/lib64/
When I run the cnn_imagenet
example without gpu, then it works. It seems to be therefore that there's a problem with the compilation of vl_imageread. Any ideas/sugestions?
@cutybug were you able to solve the problem?
Unfortunately, no :(
Hmm, maybe a hacky workaround would be to pre-process the images with utils/preprocess_imagenet.sh
(thanks Giorgos).
It is probably that there are some bad jpeg files in the original dataset which break the state of the libjpeg, so it crashes on the next image. I will try to do some tests, but thanks to the impeding CVPR deadline it may take some time...
Is there any update on this error? I also encounter the same error
Hello. I'm testing the example code of cnn_imagenet.m, however, I'm getting a repeated error code saying;
" ...(some image file name)...: error 'libjpeg: Improper call to JPEG library in state 202' "
constantly during training. (The training procedure itself is not terminated, but the above error message is repeated over and over in the console window.) The only things I have changed in the example code are the path to the imagenet data and "opts.train.gpus = [1 2 3 4];" (The machine I have has four titan x GPUs.)
I have compiled Matconvnet as follows:
vl_compilenn('enableGpu', true, 'cudaRoot', '/usr/local/cuda', 'cudaMethod', 'nvcc', 'enableCudnn', true, 'cudnnRoot', '/usr/local/cuda');
My system is: Xubuntu 14.04, cuda/cudnn 7.5, MATLAB R2016a, and latest Matconvnet. I guess that this error has something to do with the libjpeg library in the system, I have been trying for more than a week to find a cure, with no success. (I couldn't find much information about this error on the web.)
If anybody can help me, I will really appreciate it. Thank you in advance.