Closed wanfade closed 4 years ago
@wanfade can you please compile and use https://github.com/clMathLibraries/clBLAS also in the blas_kernels.c int tuning = 16 you may change in 7 cases to value 8 or 4. on my macOS with 2 Radeon VII there is no issue but I am using "mine" clBLAS mentioned above. Thanks!
Thanks for your reply. I compiled darknet with the clBLAS project, and changed 'int tuning = 16' in the blass_kernels.c with 8,4,2. But there is still the 'opencl backward_scale_kernel error: CL_OUT_OF_RESOURCES'.
I try the same code on a macbook without the clBLAS mentioned above, it work well.
Maybe it is an error on 'OpenCL 1.2 CUDA' ?
@wanfade please try this one patch if it is really on this kernel I have older a bit slower version that I tested on Titan RTX.
git apply nvcl.patch.txt
Let me know if that works? Thanks!
Fine. It works! The speed is still fast. Thanks for your help! @sowson
@wanfade can we close it now? solution commit is in the repository, thanks!
Hello, @sowson thank your great work. I used this project for training yolov3 with opencl, and it worked very well. But when training a classifier, there is a opencl backward_scale_kernel error: CL_OUT_OF_RESOURCES. I want to know if there are some problem with my way.
I used a customize dataset with 2 classes, and modified filters of the last convolution layer in cfg/darknet19.cfg.
Here are my spec: $ ./darknet classifier train dogs/dogs.data cfg/darknet19.cfg Device IDs: 2 Device ID: 0 Device name: Tesla V100-PCIE-16GB Device vendor: NVIDIA Corporation Device opencl availability: OpenCL 1.2 CUDA Device opencl used: 440.33.01 Device double precision: YES Device max group size: 1024 Device address bits: 64 darknet19 1 layer filters size input output 0 conv 32 3 x 3 / 1 256 x 256 x 3 -> 256 x 256 x 32 0.113 BFLOPs 1 max 2 x 2 / 2 256 x 256 x 32 -> 128 x 128 x 32 2 conv 64 3 x 3 / 1 128 x 128 x 32 -> 128 x 128 x 64 0.604 BFLOPs 3 max 2 x 2 / 2 128 x 128 x 64 -> 64 x 64 x 64 4 conv 128 3 x 3 / 1 64 x 64 x 64 -> 64 x 64 x 128 0.604 BFLOPs 5 conv 64 1 x 1 / 1 64 x 64 x 128 -> 64 x 64 x 64 0.067 BFLOPs 6 conv 128 3 x 3 / 1 64 x 64 x 64 -> 64 x 64 x 128 0.604 BFLOPs 7 max 2 x 2 / 2 64 x 64 x 128 -> 32 x 32 x 128 8 conv 256 3 x 3 / 1 32 x 32 x 128 -> 32 x 32 x 256 0.604 BFLOPs 9 conv 128 1 x 1 / 1 32 x 32 x 256 -> 32 x 32 x 128 0.067 BFLOPs 10 conv 256 3 x 3 / 1 32 x 32 x 128 -> 32 x 32 x 256 0.604 BFLOPs 11 max 2 x 2 / 2 32 x 32 x 256 -> 16 x 16 x 256 12 conv 512 3 x 3 / 1 16 x 16 x 256 -> 16 x 16 x 512 0.604 BFLOPs 13 conv 256 1 x 1 / 1 16 x 16 x 512 -> 16 x 16 x 256 0.067 BFLOPs 14 conv 512 3 x 3 / 1 16 x 16 x 256 -> 16 x 16 x 512 0.604 BFLOPs 15 conv 256 1 x 1 / 1 16 x 16 x 512 -> 16 x 16 x 256 0.067 BFLOPs 16 conv 512 3 x 3 / 1 16 x 16 x 256 -> 16 x 16 x 512 0.604 BFLOPs 17 max 2 x 2 / 2 16 x 16 x 512 -> 8 x 8 x 512 18 conv 1024 3 x 3 / 1 8 x 8 x 512 -> 8 x 8 x1024 0.604 BFLOPs 19 conv 512 1 x 1 / 1 8 x 8 x1024 -> 8 x 8 x 512 0.067 BFLOPs 20 conv 1024 3 x 3 / 1 8 x 8 x 512 -> 8 x 8 x1024 0.604 BFLOPs 21 conv 512 1 x 1 / 1 8 x 8 x1024 -> 8 x 8 x 512 0.067 BFLOPs 22 conv 1024 3 x 3 / 1 8 x 8 x 512 -> 8 x 8 x1024 0.604 BFLOPs 23 conv 2 1 x 1 / 1 8 x 8 x1024 -> 8 x 8 x 2 0.000 BFLOPs 24 avg 8 x 8 x 2 -> 2 25 softmax 2 Learning Rate: 0.1, Momentum: 0.9, Decay: 0.0005 384 128 448 Saving weights to dogs/backup/darknet19.start.conv.weights Loaded: 0.000083 seconds opencl backward_scale_kernel error: CL_OUT_OF_RESOURCES
I set breakpoint in opencl.c , and it finally positioned to line 837 with : clErr = clEnqueueNDRangeKernel(opencl_queues[opencl_device_id_t], kernel, 2, globalOffser, globalItems, localItems, 0, NULL, NULL); And it is in the backward_gpu function of a convolution layer.
Thanks a lot