Non-square padding issues

romix commented 6 years ago

Quick question: Was libdnn tested with convolutions which use non-square paddings?

It works fine with square padding for me, but has some issues with a non-square one.

It could be, of course, that I do something wrong. But before I dive deep into a debug session, I thought I'd check if it is known to work at all.

naibaf7 commented 6 years ago

Yes I have unit tests in place that do test non-square padding. However of course, there could always be some strange bug I overlooked... so more details on your issue would be appreciated.

romix commented 6 years ago

@naibaf7

Yes I have unit tests in place that do test non-square padding

Nice! Could you provide a reference to these tests? I'd like to check that I setup the parameters properly on my side.

naibaf7 commented 6 years ago

Here: https://github.com/naibaf7/caffe/blob/master/src/caffe/test/test_libdnn_conv.cpp

Note however left and right padding have to be identical. Top and bottom padding as well, but the two are independent, so one padding value per tensor dimension.

Note also that this version of LibDNN can have bugs present that have been fixed inside the Caffe LibDNN. I cannot offer the new LibDNN as a standalone yet, since it depends on Caffe's device abstraction and reduced precision implementation (INT8/INT16/FP16).

romix commented 6 years ago

Note however left and right padding have to be identical. Top and bottom padding as well, but the two are independent, so one padding value per tensor dimension.

Yes, sure. There is no misunderstanding about this.

Could you have a quick look at the following file to see if the way how it handles padding is correct (essentially v_p_0 and v_p_1 are the only padding-related parameters): https://github.com/pytorch/glow/blob/master/lib/Backends/OpenCL/kernels_fwd_conv.cl

The convolution kernel in this file is very heavily based on the convolution kernels produced by LibDNN, but it is:

Self-contained and parameterized only by means of macro-definitions.
And the body of the function is always the same for any combination of parameters. This way it is very easy to embed it into a 3rd party application.

Off-topic: It would be actually cool if more of LibDNN generated kernels could be expressed in a similar way, thus making their embedding into other apps/frameworks much easier. But this is a different story that is unrelated to current issue ;-)

naibaf7 commented 6 years ago

@romix I will have a look. Unfortunately, LibDNN is moving more and more away from self-contained macro-defined code. It's just insufficient to express the kind of code the latest LibDNN needs to generate for all the different data types, rounding modes, GPUs and programming languages (HIP, CUDA, OpenCL), so this will not happen. Note that integration of future LibDNN kernels pretty much depends on a large chunk of Caffe code to be present, and is not and was never intended as a kernel-only library, but a kernel generator.

romix commented 6 years ago

@naibaf7 The issue is solved. The kernel itself is fine. But its launch parameters were set up in a wrong way in a special case of running on the CPU-backed OpenCL device on a MacBook Pro. The reported max global/local work sizes are weird in this case, i.e. (1024,1,1) IIRC, and have to be handled in a special way. Sorry for the noise!

romix commented 6 years ago

@naibaf7

Unfortunately, LibDNN is moving more and more away from self-contained macro-defined code. It's just insufficient to express the kind of code the latest LibDNN needs to generate for all the different data types, rounding modes, GPUs and programming languages (HIP, CUDA, OpenCL), so this will not happen. Note that integration of future LibDNN kernels pretty much depends on a large chunk of Caffe code to be present, and is not and was never intended as a kernel-only library, but a kernel generator.

That's fine. Auto-generating kernels on the fly is OK as long as the LibDNN library itself is rather self-contained.

Having a requirement for a large chunk of Caffe code to be present is more of an issue. I guess it is required for proving the HW abstraction and the like? Ideally, it would be nice if these abstractions could be expressed in generic configurable form, so that you can provide a different underlying HW abstraction layer (e.g. not based on Caffe) when you integrate LibDNN into your framework/application.

naibaf7 commented 6 years ago

@romix No problem. Yes the OpenCL implementations on Mac are abysmal, unfortunately. But there are still worse ones out there, from vendors other than AMD, Intel and nVIdia.

Sure, LibDNN will probably eventually be extracted from Caffe so that the HW abstraction, LibDNN and Caffe can operate as three stacked layers, and people looking to use LibDNN could drop the Caffe layer, only use the HW abstraction plus LibDNN, and use their own framework on top instead. But that will take me a lot of development time in the future.

naibaf7 / libdnn

Non-square padding issues #29