naibaf7 / caffe

Caffe: a fast open framework for deep learning. With OpenCL and CUDA support.
http://caffe.berkeleyvision.org/
Other
85 stars 20 forks source link

Very fast version OpenCL Kernel of BackProgWeights #33

Closed fsword73 closed 8 years ago

fsword73 commented 8 years ago

Please find it in https://github.com/fsword73/CNNKernelPerfTest/blob/master/CL/test_kernel_backpropweights_fast.cl which is 500x faster than original one from my test platform AMD FirePro™ S9000. The code is very simple. I will have version with more memory localization optimization next week. Please help to review the kernel code. I can not find your email. So that I leave mine here: JIAN D0T YANG AT AMD D0T COM quick introducing of my self: 10 years team leading on GPU performance Analysis and Tuning. I Will rewrite following kernels in the next several weeks. filter3x3, 5x5, 7x7 to 11x11 Backward.

I will delete the issue after you emailed to me.

psyhtest commented 8 years ago

@fsword73 Sounds great! Why don't you do a pull request?

naibaf7 commented 8 years ago

@fsword73 OK I'll have a look and email you. @psyhtest It seems more of a tech demo than a DNN framework-ready kernel. For example, the input data transformations don't seem to regard pad, data, stride or dilation. The weight updates also work differently: The kernel is supposed to output the raw weight_diff and bias_diff to the CNN framework, LR (learning rate) and weight/bias update are updated by the gradient solver, and should not be changed by the convolution weight kernel.

naibaf7 commented 8 years ago

@fsword73 I really like the atomic update methods for batch-wide weight diff accumulation. If we could integrate these methods in Greentea-libDNN (which are already conform to all possible DNN convolution configurations) that'd be great. Let me know if you are interested to collaborate.

fsword73 commented 8 years ago

Greentea-libDNN is good idea. Verify this kernel on deepCL at first.