CL_OUT_OF_RESOURCES error when training in classifier .

wanfade commented 4 years ago

Hello, @sowson thank your great work. I used this project for training yolov3 with opencl, and it worked very well. But when training a classifier, there is a opencl backward_scale_kernel error: CL_OUT_OF_RESOURCES. I want to know if there are some problem with my way.

I used a customize dataset with 2 classes, and modified filters of the last convolution layer in cfg/darknet19.cfg.

Here are my spec: $ ./darknet classifier Device IDs: 2 Device ID: 0 Device name: Tesla V100-PCIE-16GB Device vendor: NVIDIA Corporation Device opencl availability: Device opencl used: 440.33.01 Device double precision: YES Device max group size: 1024 Device address bits: 64 darknet19 1 layer filters size 0 conv 32 3 x 3 / 1 1 max 2 x 2 / 2 2 conv 64 3 x 3 / 1 3 max 2 x 2 / 2 4 conv 128 3 x 3 / 1 5 conv 64 1 x 1 / 1 6 conv 128 3 x 3 / 1 7 max 2 x 2 / 2 8 conv 256 3 x 3 / 1 9 conv 128 1 x 1 / 1 10 conv 256 3 x 3 / 1 11 max 2 x 2 / 2 12 conv 512 3 x 3 / 1 13 conv 256 1 x 1 / 1 14 conv 512 3 x 3 / 1 15 conv 256 1 x 1 / 1 16 conv 512 3 x 3 / 1 17 max 2 x 2 / 2 18 conv 1024 3 x 3 / 1 19 conv 512 1 x 1 / 1 20 conv 1024 3 x 3 / 1 21 conv 512 1 x 1 / 1 22 conv 1024 3 x 3 / 1 23 conv 2 1 x 1 / 1 24 avg 25 softmax Learning Rate: 0.1, 384 128 448 Saving weights to Loaded: 0.000083 seconds opencl backward_scale_kernel train dogs/dogs.data cfg/darknet19.cfg OpenCL 1.2 CUDA input output 256 x 256 x 3 -> 256 x 256 x 32 0.113 BFLOPs 256 x 256 x 32 -> 128 x 128 x 32 128 x 128 x 32 -> 128 x 128 x 64 0.604 BFLOPs 128 x 128 x 64 -> 64 x 64 x 64 64 x 64 x 64 -> 64 x 64 x 128 0.604 BFLOPs 64 x 64 x 128 -> 64 x 64 x 64 0.067 BFLOPs 64 x 64 x 64 -> 64 x 64 x 128 0.604 BFLOPs 64 x 64 x 128 -> 32 x 32 x 128 32 x 32 x 128 -> 32 x 32 x 256 0.604 BFLOPs 32 x 32 x 256 -> 32 x 32 x 128 0.067 BFLOPs 32 x 32 x 128 -> 32 x 32 x 256 0.604 BFLOPs 32 x 32 x 256 -> 16 x 16 x 256 16 x 16 x 256 -> 16 x 16 x 512 0.604 BFLOPs 16 x 16 x 512 -> 16 x 16 x 256 0.067 BFLOPs 16 x 16 x 256 -> 16 x 16 x 512 0.604 BFLOPs 16 x 16 x 512 -> 16 x 16 x 256 0.067 BFLOPs 16 x 16 x 256 -> 16 x 16 x 512 0.604 BFLOPs 16 x 16 x 512 -> 8 x 8 x 512 8 x 8 x 512 -> 8 x 8 x1024 0.604 BFLOPs 8 x 8 x1024 -> 8 x 8 x 512 0.067 BFLOPs 8 x 8 x 512 -> 8 x 8 x1024 0.604 BFLOPs 8 x 8 x1024 -> 8 x 8 x 512 0.067 BFLOPs 8 x 8 x 512 -> 8 x 8 x1024 0.604 BFLOPs 8 x 8 x1024 -> 8 x 8 x 2 0.000 BFLOPs 8 x 8 x 2 -> 2 2 Momentum: 0.9, Decay: 0.0005 dogs/backup/darknet19.start.conv.weights error: CL_OUT_OF_RESOURCES

I set breakpoint in opencl.c , and it finally positioned to line 837 with : clErr = clEnqueueNDRangeKernel(opencl_queues[opencl_device_id_t], kernel, 2, globalOffser, globalItems, localItems, 0, NULL, NULL); And it is in the backward_gpu function of a convolution layer.

Thanks a lot

sowson commented 4 years ago

@wanfade can you please compile and use https://github.com/clMathLibraries/clBLAS also in the blas_kernels.c int tuning = 16 you may change in 7 cases to value 8 or 4. on my macOS with 2 Radeon VII there is no issue but I am using "mine" clBLAS mentioned above. Thanks!

wanfade commented 4 years ago

Thanks for your reply. I compiled darknet with the clBLAS project, and changed 'int tuning = 16' in the blass_kernels.c with 8,4,2. But there is still the 'opencl backward_scale_kernel error: CL_OUT_OF_RESOURCES'.

I try the same code on a macbook without the clBLAS mentioned above, it work well.

Maybe it is an error on 'OpenCL 1.2 CUDA' ?

sowson commented 4 years ago

@wanfade please try this one patch if it is really on this kernel I have older a bit slower version that I tested on Titan RTX.

nvcl.patch.txt

git apply nvcl.patch.txt

Let me know if that works? Thanks!

wanfade commented 4 years ago

Fine. It works! The speed is still fast. Thanks for your help! @sowson

sowson commented 4 years ago

@wanfade can we close it now? solution commit is in the repository, thanks!

sowson / darknet

CL_OUT_OF_RESOURCES error when training in classifier . #33