oneapi-src / oneDNN

oneAPI Deep Neural Network Library (oneDNN)
https://uxlfoundation.org
Apache License 2.0
3.58k stars 986 forks source link

Additional eltwise primitives #1296

Closed edubart closed 2 years ago

edubart commented 2 years ago

I am using the library to make a tensor library using C API only (no C++ API), and I miss the following element-wise primitives:

At the moment I see no other way to perform these operations without use of dnnl_memory_map_data, would be better if the operations could be streamed to the engine.

Also the following operations would be useful, although the library has enough APIs to perform these operations, thus less important:

Other libraries like CUDNN for example supports all these primitives in their pointwise API.

Plus a element-wise operation to assign all data to a constant value would also be useful, this was discussed in #https://github.com/oneapi-src/oneDNN/issues/1294

dzarukin commented 2 years ago

Hi @edubart, thank you for mentioning this.

When it comes to sin, cos and tan, indeed, there's no support of such activations in oneDNN. This is mostly because they are not used in any major topologies today, and priority of adding such functions is very low.

As for floor or ceil, oneDNN lacks direct call to those. But in case the application you are running wants to use specific rounding mode consistently, oneDNN provides round activation which relies on MXCSR register value. User may set the preferable policy through C++ calls.

rsqrt and reciprocal are supported in oneDNN through eltwise_pow API with alpha = 1 and beta = -1/2 for rsqrt, and alpha = 1 and beta = -1 for reciprocal. neg is supported through eltwise_linear with alpha = -1 and beta = 0. Thank you.

edubart commented 2 years ago

When it comes to sin, cos and tan, indeed, there's no support of such activations in oneDNN. This is mostly because they are not used in any major topologies today, and priority of adding such functions is very low.

Unfortunate, but understandable, I am trying to using the library not just for deep learning but as a backend for numeric scientific computations in general.

As for floor or ceil, oneDNN lacks direct call to those. But in case the application you are running wants to use specific rounding mode consistently, oneDNN provides round activation which relies on MXCSR register value. User may set the preferable policy through C++ calls.

I suppose that will only work for the CPU backend, and not for the GPU backend, right?

rsqrt and reciprocal are supported in oneDNN through eltwise_pow API with alpha = 1 and beta = -1/2 for rsqrt, and alpha = 1 and beta = -1 for reciprocal. neg is supported through eltwise_linear with alpha = -1 and beta = 0. Thank you.

I see, though seems like that path will cost some extra float operations. Using pow for reciprocal seems like it will take much more overhead than just doing 1/x, the current way I do reciprocal is performing a binary division using a tensor with ones and broadcasting, wouldn't this way be more efficient?

Update: I've made a minimal benchmark on this, and seems like that hypothesis is not true, using pow for reciprocal is more efficient than binary operation with broadcasting. Although if there was an reciprocal eltwise primitive would be probably more efficient than pow with alpha = 1 and beta = -1.

dzarukin commented 2 years ago

I am trying to using the library not just for deep learning but as a backend for numeric scientific computations in general

No judgement or advices here, but you may want to consider using several libraries. If you are seeking for transcendental math functions support, you may consider using math libraries integrated into Intel compiler, if this targets Intel GPU. If this targets Nvidia GPU, probably a dispatcher between cuda libraries is a one way to go. Any way, I would not recommend to rely on oneDNN for general purpose solutions. The library has quite limited functionality with a narrow DL specifics.

I suppose that will only work for the CPU backend, and not for the GPU backend, right?

Yeah, guess so. Forgot to mention this quite big detail, sorry. AFAIK, GPU rounding may be either defined by implementation, or even hardware, so yeah, the only reliable solution is to instruct what exactly user wants...

Using pow for reciprocal seems like it will take much more overhead than just doing 1/x...

We have some shortcuts for specific cases like this. There's no support for beta=-0.5f, but it's not a big deal to add it there. This is CPU. For GPU a standalone eltwise implementation will use natively provided powf function which may or may not implement branches internally. GPU eltwise post-op will use this impl on forward. If there's a need for -0.5f case on CPU, feel free to bring a contribution. Thanks.

dzarukin commented 2 years ago

Clothing the issue since the question seems to be addressed. Feel free to open another issue if you have any questions.