liuliu / ccv

C-based/Cached/Core Computer Vision Library, A Modern Computer Vision Library
http://libccv.org
Other
7.1k stars 1.71k forks source link

Image Classifier Training #114

Open Zackory opened 10 years ago

Zackory commented 10 years ago

I've been attempting to train a new classifier using the steps provided in http://libccv.org/doc/doc-convnet/ but I seem to be running into a problem. Everytime I run ./image-net to train a new model I get an error:

image-net: ccv_convnet.c:647: void ccv_convnet_encode(ccv_convnet_t *, ccv_dense_matrix_t **,
    ccv_dense_matrix_t **, int): Assertion `(((*a)->type) & 0xFFF) == convnet->channels' failed.

I've tried training with both jpg and png images, along with partially replacing image-net.c with https://gist.github.com/liuliu/8906523, however every attempt arrives at the same error.

What would be causing this, and possibly how can it bit fixed or worked around?

liuliu commented 10 years ago

I won't be able to fix the CPU counter-part in upcoming v0.7 release, unfortunately.

Zackory commented 10 years ago

CPU counter-part? At the moment I am attempting to train using GPUs.

liuliu commented 10 years ago

It is not on GPU if it is calling ccv_convnet_encode. Do you have HAVE_CUDA compliation flag in your config.mk file?

Zackory commented 10 years ago

Thanks for that, just needed to correct the path to cuda within the configure file.

I'm currently using the stable version, but reaching this error when training now:

./ccv-stable/bin/image-net --train-list train257PNGLegendLibccv.txt --test-list cv257PNGLegendLibccv.txt --working-dir image-net.sqlite3
 - compute mean activity 22500 / 22500
 - compute covariance matrix for data augmentation (color gain) 22500 / 22500
image-net: cuda/cwc_convnet.cu:764: int _cwc_convnet_convolutional_forward_propagate_vary(ccv_convnet_layer_t*, int, int, int, float*, float*, CUstream_st* const&, int, int, int): Assertion `cudaGetLastError() == cudaSuccess' failed.
Aborted (core dumped)
liuliu commented 10 years ago

What's your GPU card? From the log, seems it crashed on the first forward pass, which is interesting to me.

Zackory commented 10 years ago

I'm running with a NVIDIA Tesla C2075 GPU.

liuliu commented 10 years ago

Thanks! Some explanations:

The forward_vary function is an auto-tuning function which finds the optimal kernels to perform convolution. The parameters are checked against the limits on Kelper arch, and I haven't verified on Fermi arch.

There are two fixes: 1). I check against Fermi arch to see what's going on there, but I don't have the machine / card; 2). I can handle the kernel "cannot launch" error and ignore these

Zackory commented 10 years ago

Is there any more information I can provide that would be helpful?

liuliu commented 10 years ago

I will put up a fix based on 2 to see if that is the issue (unfortunatley, that will be in unstable though).

liuliu commented 10 years ago

Can you backport this change to see if fixed this?

https://github.com/liuliu/ccv/commit/677af3d458ca4c012bf80a9ec1dc1657ac2385d8#diff-dd2e3dbab25deec8032b1d0cbdd0e14fR109

Zackory commented 10 years ago

Same error with the changes.

image-net: cuda/cwc_convnet.cu:768: int _cwc_convnet_convolutional_forward_propagate_vary(ccv_convnet_layer_t*, int, int, int, float*, float*, CUstream_st* const&, int, int, int): Assertion `error == cudaSuccess' failed.
liuliu commented 10 years ago

Hey, @Zackory , sorry the fact that I don't have a Fermi makes everything like a very speculative fix. Can you add:

printf("cuda error %s\n", cudaGetErrorString(error));

before the assertion so that I know the exact failures that I should avoid? Thanks! It will be pretty hard for me to get a Fermi board to just fix this issue :(

Zackory commented 10 years ago

Here is the error:

cuda error invalid device function
liuliu commented 10 years ago

Thanks. I am not sure which device function is not available for Fermi arch, need to check.