enabling limits for ppc64le

benjha commented 4 years ago

Hi,

I am trying to compile Rapids 0.14 on ppc64le but I am getting the next errors after doing some operations with cuDF

...

--- JIT compile log for ---

../../libcxx/include/limits(408): error: floating constant is out of range

../../libcxx/include/limits(409): error: floating constant is out of range

../../libcxx/include/limits(431): error: floating constant is out of range

I noted Rapids defines limits for x86 while those limits are different on ppc64le, in particular,

x86 limits are

#define __LDBL_MAX__ 1.18973149535723176502e+4932L
#define __LDBL_MIN__ 3.36210314311209350626e-4932L
#define __LDBL_EPSILON__ 1.08420217248550443401e-19L
#define __LDBL_DENORM_MIN__ 3.64519953188247460253e-4951L

while ppc64le GNU's (gcc 7.4) defines

#define __LDBL_MAX__ 1.79769313486231580793728971405301e+308L
#define __LDBL_MIN__ 2.00416836000897277799610805135016e-292L
#define __LDBL_EPSILON__ 4.94065645841246544176568792868221e-324L
#define __LDBL_DENORM_MIN__ 4.94065645841246544176568792868221e-324L

I noted there is support for ppc64le here:

https://github.com/ogiroux/libcxx/blob/36156f0962149ef19ce778110642150b456a182b/include/support/ibm/limits.h

So the question is what flag should be used to enable ppc64le support.

Thanks,

Benjamin

zronaghi commented 4 years ago

@kkraus14 could you please add relevant folks to look into this? It's the same issue that was mentioned on GoAi slack

kkraus14 commented 4 years ago

cc @devavret @trxcllnt who were involved in the slack conversation

benjha commented 4 years ago

Just to clarify that the suggested flag, -mno-float128, was not recognized by nvRTC/jitify, so it should be another flag.

$ python ../test.py 
Compiler options: -std=c++14 -remove-unused-globals -mno-float128 -w -D__CUDACC_RTC__ -D__CHAR_BIT__=8 -D_LIBCUDACXX_HAS_NO_CTIME -D_LIBCUDACXX_HAS_NO_WCHAR -D_LIBCUDACXX_HAS_NO_CFLOAT -D_LIBCUDACXX_HAS_NO_STDINT -D_LIBCUDACXX_HAS_NO_CSTDDEF -D_LIBCUDACXX_HAS_NO_CLIMITS -D_LIBCPP_DISABLE_VISIBILITY_ANNOTATIONS -arch=compute_70 
Traceback (most recent call last):
  File "../test.py", line 8, in <module>
    s1 = s - 1
  File "/gpfs/alpine/world-shared/stf011/nvrapids_0.14_gcc_7.4.0/lib/python3.7/site-packages/cudf/core/series.py", line 1060, in __sub__
    return self._binaryop(other, "sub")
  File "/gpfs/alpine/world-shared/stf011/nvrapids_0.14_gcc_7.4.0/lib/python3.7/contextlib.py", line 74, in inner
    return func(*args, **kwds)
  File "/gpfs/alpine/world-shared/stf011/nvrapids_0.14_gcc_7.4.0/lib/python3.7/site-packages/cudf/core/series.py", line 1002, in _binaryop
    outcol = lhs._column.binary_operator(fn, rhs, reflect=reflect)
  File "/gpfs/alpine/world-shared/stf011/nvrapids_0.14_gcc_7.4.0/lib/python3.7/site-packages/cudf/core/column/numerical.py", line 95, in binary_operator
    lhs=self, rhs=rhs, op=binop, out_dtype=out_dtype, reflect=reflect
  File "/gpfs/alpine/world-shared/stf011/nvrapids_0.14_gcc_7.4.0/lib/python3.7/contextlib.py", line 74, in inner
    return func(*args, **kwds)
  File "/gpfs/alpine/world-shared/stf011/nvrapids_0.14_gcc_7.4.0/lib/python3.7/site-packages/cudf/core/column/numerical.py", line 434, in _numeric_column_binop
    out = libcudf.binaryop.binaryop(lhs, rhs, op, out_dtype)
  File "cudf/_lib/binaryop.pyx", line 193, in cudf._lib.binaryop.binaryop
  File "cudf/_lib/binaryop.pyx", line 126, in cudf._lib.binaryop.binaryop_v_s
RuntimeError: NVRTC error: NVRTC_ERROR_INVALID_OPTION

benjha commented 4 years ago

It seems ppc64le support is enabled by __IBMCPP__ as suggests line 113 of the next file

https://github.com/ogiroux/libcxx/blob/36156f0962149ef19ce778110642150b456a182b/include/limits#L113

added that flag to Jitify compiler flags (.../cudf/cpp/src/jit/common_headers.hpp) :

const std::vector<std::string> compiler_flags{
  "-std=c++14",
  // Have jitify prune unused global variables
  "-remove-unused-globals",
  // ppc64le limits
  "-D__IBMCPP__",
  // suppress all NVRTC warnings
  "-w",
  // force libcudacxx to not include system headers
  "-D__CUDACC_RTC__",
  // __CHAR_BIT__ is from GCC, but libcxx uses it
  "-D__CHAR_BIT__=" + std::to_string(__CHAR_BIT__),
  // enable temporary workarounds to compile libcudacxx with nvrtc
  "-D_LIBCUDACXX_HAS_NO_CTIME",
  "-D_LIBCUDACXX_HAS_NO_WCHAR",
  "-D_LIBCUDACXX_HAS_NO_CFLOAT",
  "-D_LIBCUDACXX_HAS_NO_STDINT",
  "-D_LIBCUDACXX_HAS_NO_CSTDDEF",
  "-D_LIBCUDACXX_HAS_NO_CLIMITS",
  "-D_LIBCPP_DISABLE_VISIBILITY_ANNOTATIONS",
};

However, jitify/nvrtc is not finding the ppc64le header files:

$ python test.py
../../libcxx/include/limits(114): warning: support/ibm/limits.h: [jitify] File not found
---------------------------------------------------
--- JIT compile log for  ---
---------------------------------------------------
../../libcxx/include/limits(408): error: floating constant is out of range

../../libcxx/include/limits(409): error: floating constant is out of range

../../libcxx/include/limits(431): error: floating constant is out of range

3 errors detected in this compilation.

---------------------------------------------------
Traceback (most recent call last):
  File "test.py", line 8, in <module>
    s1 = s - 1
  File "/gpfs/alpine/world-shared/stf011/nvrapids_0.14_gcc_7.4.0/lib/python3.7/site-packages/cudf/core/series.py", line 1060, in __sub__
    return self._binaryop(other, "sub")
  File "/gpfs/alpine/world-shared/stf011/nvrapids_0.14_gcc_7.4.0/lib/python3.7/contextlib.py", line 74, in inner
    return func(*args, **kwds)
  File "/gpfs/alpine/world-shared/stf011/nvrapids_0.14_gcc_7.4.0/lib/python3.7/site-packages/cudf/core/series.py", line 1002, in _binaryop
    outcol = lhs._column.binary_operator(fn, rhs, reflect=reflect)
  File "/gpfs/alpine/world-shared/stf011/nvrapids_0.14_gcc_7.4.0/lib/python3.7/site-packages/cudf/core/column/numerical.py", line 95, in binary_operator
    lhs=self, rhs=rhs, op=binop, out_dtype=out_dtype, reflect=reflect
  File "/gpfs/alpine/world-shared/stf011/nvrapids_0.14_gcc_7.4.0/lib/python3.7/contextlib.py", line 74, in inner
    return func(*args, **kwds)
  File "/gpfs/alpine/world-shared/stf011/nvrapids_0.14_gcc_7.4.0/lib/python3.7/site-packages/cudf/core/column/numerical.py", line 434, in _numeric_column_binop
    out = libcudf.binaryop.binaryop(lhs, rhs, op, out_dtype)
  File "cudf/_lib/binaryop.pyx", line 193, in cudf._lib.binaryop.binaryop
  File "cudf/_lib/binaryop.pyx", line 126, in cudf._lib.binaryop.binaryop_v_s
RuntimeError: Runtime compilation failed

despite support/ibm/limits.h exists on my conda environment:

$ pwd
/gpfs/alpine/world-shared/stf011/nvrapids_0.14_gcc_7.4.0/include
$ find . -name limits.h
./cuda/std/detail/libcxx/include/support/ibm/limits.h
./cuda/std/detail/libcxx/include/limits.h
./thrust/limits.h
./libcudf/libcxx/include/support/ibm/limits.h
./libcudf/libcxx/include/limits.h

devavret commented 4 years ago

@benjha All the headers used by jitify are converted to a string and included in the binary. Jitify does not pick the file from the system but from this list of strings. We could do this for the ibm limits header but I think a better fix would be if we could somehow disable long double entirely since we do not support it in cudf.

griwes commented 4 years ago

Hi folks,

Can you try to use libcu++ that's included in CUDA 11.0 to test if it works? This repo isn't maintained, we'll try to push the libcu++ repo to github in Near Future.

kkraus14 commented 4 years ago

@griwes Building cuDF from source on CUDA 11 is currently quite difficult because we have CUDA based dependencies that don't yet have CUDA 11 builds published because the conda packaging ecosystem was CentOS 6 based 😅. Working on getting things up and going with CUDA 11 but will likely take a few weeks.

devavret commented 4 years ago

One other option could be to define a different __LDBL_MAX__ and __LDBL_MIN__ using #ifdef __IBMCPP__ in rapids' fork here https://github.com/rapidsai/thirdparty-freestanding/blob/cdcda484d0c7db114ea29c3b33429de5756ecfd8/include/simt/cfloat#L113-L114.

trxcllnt commented 4 years ago

@benjha since you're building from source already, you can add the support/ibm/limits.h header to the list of stringified headers in cuDF's CMakeLists.txt and see if it works.

devavret commented 4 years ago

It appears that this would be fixed by https://github.com/rapidsai/cudf/pull/5674

ogiroux / freestanding

enabling limits for ppc64le #11

...

--- JIT compile log for ---