Segfault and error on Mac with OpenCL and different array types

ankane commented 3 years ago

Describe the bug

Hi, I'm seeing a segfault and error on Mac with OpenCL and different array data types.

f32 - success
f64 - segfault
s32 - error (stack trace below)

  File "/usr/local/lib/python3.9/site-packages/khiva/matrix.py", line 231, in stomp_self_join
    raise Exception(str(error_message.value.decode()))
Exception: stomp_self_join: ArrayFire Exception (Internal error:998):
In function cl::Program opencl::buildProgram(const vector<std::__1::string> &, const vector<std::__1::string> &)
In file src/backend/opencl/compile_module.cpp:128
OpenCL Device: Intel(R) Iris(TM)

To Reproduce

It's easiest to reproduce with the Python library (but think it's probably related to the C++ code).

from khiva.array import Array, dtype
from khiva.library import get_backend_info, set_backend, KHIVABackend
from khiva.matrix import stomp_self_join

# everything works with the CPU backend
# set_backend(KHIVABackend.KHIVA_BACKEND_CPU)

print(get_backend_info())

# success
a = Array.from_list([1, 2, 3, 4, 5], dtype.f32)
stomp_self_join(a, 3)

# segfault
a = Array.from_list([1, 2, 3, 4, 5], dtype.f64)
stomp_self_join(a, 3)

# error
a = Array.from_list([1, 2, 3, 4, 5], dtype.s32)
stomp_self_join(a, 3)

Expected behavior

No segfault or error, like with the CPU backend.

Environment information:

OS: Mac OS 11.1
Khiva Version: 0.5.0
Khiva dependencies versions: ArrayFire 3.7.3, Boost 1.74.0

Here's the output of get_backend_info():

ArrayFire v3.7.3 (OpenCL, 64-bit Mac OSX, build default)
[0] APPLE: Intel(R) Iris(TM) Plus Graphics, 1536 MB

Additional context

Let me know if there's any I can do to help debug.

avilchess commented 3 years ago

Hi Andrew, thanks for reposting this issue.

This issue seems to be related with ArrayFire, as it is the abstraction layer we use to decouple our implementation from vendor APIs as CUDA or OpenCL. As far as I know, Intel GPUs do not support f64 data, that maybe be the reason for that seg-fault. However, it should not be breaking the execution and produce a controlled error message. If I can reproduce the issue I will work solve this. Just one question, is it breaking during the Array creation or during the method execution? Regarding the s32 data type, this method has not been tested against that data type. Could you provide some more info about the error you are getting?.

So, I would suggest you cast your data to f32 before calling this method. In the meanwhile, we will try to reproduce this issue.

Thanks in advance.

ankane commented 3 years ago

Great, thanks. Both the error and segfault occur in stomp_self_join (array creation works). The output for the error is above, but it doesn't seem super helpful (Internal error in opencl::buildProgram).

avilchess commented 3 years ago

Andrew, could you try using with the following function instead?: https://github.com/shapelets/khiva-python/blob/891622ce7384605a825b32c2ae70a36872762678/khiva/matrix.py#L268

ankane commented 3 years ago

With matrix_profile_self_join, it errors with f32 and s32 (same error for both):

  File "/usr/local/lib/python3.9/site-packages/khiva/matrix.py", line 294, in matrix_profile_self_join
    raise Exception(str(error_message.value.decode()))
Exception: matrix_profile_self_join: ArrayFire Exception (Double precision not supported for this device:401):
In function void opencl::(anonymous namespace)::verifyTypeSupport()
In file src/backend/opencl/Array.cpp:59
Double precision not supported
 0# void opencl:

And segfaults with f64.

avilchess commented 3 years ago

I was expecting something like that, I will work on a fix for this issue next week. Thanks for the feedback, Andrew.

ankane commented 3 years ago

Sounds good, thanks @avilchess.

ankane commented 2 years ago

Cleaning up stale issues

shapelets / khiva

Segfault and error on Mac with OpenCL and different array types #161