pytorch / extension-cpp

C++ extensions in PyTorch
1.02k stars 214 forks source link

Half Tensor Dispatch compatibility ? #15

Closed ClementPinard closed 6 years ago

ClementPinard commented 6 years ago

Hi, been tweaking the repo a bit, and wanted to try Half Tensor compatibility

So in the cuda, instead of AT_DISPATCH_FLOATING_TYPES here and here

I just changed the dispatch function to AT_DISPATCH_FLOATING_TYPES_AND_HALF, naively hoping that everything would work without changing anything else.

Unfortunately, I got this error (while only dispatching floating types work) :

lltm_cuda_kernel.cu(123): error: identifier "Half" is undefined

lltm_cuda_kernel.cu(157): error: identifier "Half" is undefined

Is there something i forgot to do ? apparently the Half is not recognized by the compiler like it is for float or double so maybe I need to include a header ? I tried #include <cuda_fp16.h> , #include <ATen/Half.h> and #include <ATen/Type.h> but it didn't work.

Thanks !

Clément

colesbury commented 6 years ago
  1. Add using namespace at; at the top. We need to qualify Half as at::Half in Dispatch.h to avoid this.
  2. replace fmax(0.0, z) with fmax(scalar_t(0.0), z)
ClementPinard commented 6 years ago

Thanks for the rapid answer ! I commented your PR. In the mean time, I applied your recommandations, and it works, thanks !

However I initially thought that your "2." comment was a general case, like "replace all 0 occurences by scalar_t(0.0). But when doing that, the compiler stops at fmin(scalar_t(0.0), ...) ! any idea why this modif is working on fmax and not fmin ? is there some overloading on fmin and not fmax ?

goldsborough commented 6 years ago

@ClementPinard someone else reported issues with fmax being interpreted as std::fmax instead of the CUDA function once https://github.com/pytorch/extension-cpp/issues/14

colesbury commented 6 years ago

It's just a matter of operator overloading and implicit conversions. C++ allows an argument to undergo one implicit conversion to satisfy an overload, but not two. Half has an implicit conversion to float. float has an implicit conversion to double. But there's no implicit conversion from Half to double directly.

You can see the list of overloads for fmax here: https://en.cppreference.com/w/cpp/numeric/math/fmax

fmax(0.0, z) has the types (double, at::Half) which does not satisfy any overload. Changing it to fmax(scalar_t(0.0), z)gives the type (at::Half, at::Half). Both arguments are implicitly convertible to float so it uses the overload fmax(float, float)

fmin has the same rules as fmax but the argument is different. alpha * (exp(z) - 1.0)) is a double when z is Half because of the - 1.0.

There might be a performance penalty of using doubles here, I'm not sure. You could probably change all the 0.0 and 1.0 to 0.0f and 1.0f.