wanderine / BROCCOLI

BROCCOLI: Software for Fast fMRI Analysis on Many-Core CPUs and GPUs
GNU General Public License v3.0
113 stars 38 forks source link

Error building kernelBayesian.cpp for GPU on macOS #48

Open a-hurst opened 6 years ago

a-hurst commented 6 years ago

I'm testing out first-level analysis using BROCCOLI with some data I'd previously analyzed with FSL. When I run FirstLevelAnalysis using my CPU as the OpenCL device, everything builds perfectly fine and the analysis seems to run as intended. When I try running the same analysis with my GPU, however, I get the following error:

Source build error for kernelBayesian.cpp is CL_BUILD_PROGRAM_FAILURE 
One or several kernels were not created correctly, check buildInfo* !

Looking at the build log for that file for my GPU, it contains just this single line:

ptxas error   : Program using constant pointers passed as entry function parameter cannot use cvta.const

The only reference I could turn up to this error on Google was another OpenCL developer having the same issue on a Mac, but he never posted a solution. Here's my GetOpenCLInfo output:

Device info 

---------------------------------------------
Platform number: 0
---------------------------------------------
Platform vendor: Apple
Platform name: Apple
Platform extentions: cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event
Platform profile: FULL_PROFILE
---------------------------------------------

---------------------------------------------
Device number: 0
---------------------------------------------
Device vendor: Intel
Device name: Intel(R) Core(TM) i7-4771 CPU @ 3.50GHz
Hardware version: OpenCL 1.2 
Software version: 1.1
OpenCL C version: OpenCL C 1.2 
Device extensions: cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_image2d_from_buffer cl_APPLE_fp64_basic_ops cl_APPLE_fixed_alpha_channel_orders cl_APPLE_biased_fixed_point_image_formats cl_APPLE_command_queue_priority
Global memory size in MB: 8192
Size of largest memory object in MB: 2048
Global memory cache size in KB: 0
Local memory size in KB: 32
Constant memory size in KB: 64
Parallel compute units: 8
Clock frequency in MHz: 3500
Max number of threads per block: 1024
Max number of threads in each dimension: 1024 1 1

---------------------------------------------
Device number: 1
---------------------------------------------
Device vendor: NVIDIA
Device name: GeForce GTX 780M
Hardware version: OpenCL 1.2 
Software version: 10.30.25 355.11.10.10.30.120
OpenCL C version: OpenCL C 1.2 
Device extensions: cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_APPLE_fp64_basic_ops cl_khr_fp64 cl_khr_3d_image_writes cl_khr_depth_images cl_khr_gl_depth_images cl_khr_gl_msaa_sharing cl_khr_image2d_from_buffer cl_APPLE_ycbcr_422 cl_APPLE_rgb_422 
Global memory size in MB: 4096
Size of largest memory object in MB: 1024
Global memory cache size in KB: 0
Local memory size in KB: 48
Constant memory size in KB: 64
Parallel compute units: 8
Clock frequency in MHz: 648
Max number of threads per block: 1024
Max number of threads in each dimension: 1024 1024 64

I've tried installing the NVIDIA Web Drivers and CUDA drivers for my GPU to see if that would make a difference, but without any luck. I'm running macOS 10.13.4 (High Sierra). Any idea what's going on?

Thanks in advance!

wanderine commented 6 years ago

Hi Austin,

I have not seen this error before. In general OpenCL on Mac is not as stable as on Windows and Linux for some reason, kernels that run fine on Windows and Linux suddenly break on Mac.

2018-04-10 16:17 GMT+02:00 a-hurst notifications@github.com:

I'm testing out first-level analysis using BROCCOLI with some data I'd previously analyzed with FSL. When I run FirstLevelAnalysis using my CPU as the OpenCL device, everything builds perfectly fine and the analysis seems to run as intended. When I try running the same analysis with my GPU, however, I get the following error:

Source build error for kernelBayesian.cpp is CL_BUILD_PROGRAM_FAILURE One or several kernels were not created correctly, check buildInfo* !

Looking at the build log for that file for my GPU, it contains just this single line:

ptxas error : Program using constant pointers passed as entry function parameter cannot use cvta.const

The only reference I could turn up to this error on Google was another OpenCL developer having the same issue on a Mac https://devtalk.nvidia.com/default/topic/1015746/possible-compiler-bug-in-mac-nvidia-web-driver-346-03-15f08-on-osx-10-11-6/, but he never posted a solution. Here's my GetOpenCLInfo output:

Device info


Platform number: 0

Platform vendor: Apple Platform name: Apple Platform extentions: cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event Platform profile: FULL_PROFILE


Device number: 0

Device vendor: Intel Device name: Intel(R) Core(TM) i7-4771 CPU @ 3.50GHz Hardware version: OpenCL 1.2 Software version: 1.1 OpenCL C version: OpenCL C 1.2 Device extensions: cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_image2d_from_buffer cl_APPLE_fp64_basic_ops cl_APPLE_fixed_alpha_channel_orders cl_APPLE_biased_fixed_point_image_formats cl_APPLE_command_queue_priority Global memory size in MB: 8192 Size of largest memory object in MB: 2048 Global memory cache size in KB: 0 Local memory size in KB: 32 Constant memory size in KB: 64 Parallel compute units: 8 Clock frequency in MHz: 3500 Max number of threads per block: 1024 Max number of threads in each dimension: 1024 1 1


Device number: 1

Device vendor: NVIDIA Device name: GeForce GTX 780M Hardware version: OpenCL 1.2 Software version: 10.30.25 355.11.10.10.30.120 OpenCL C version: OpenCL C 1.2 Device extensions: cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_APPLE_fp64_basic_ops cl_khr_fp64 cl_khr_3d_image_writes cl_khr_depth_images cl_khr_gl_depth_images cl_khr_gl_msaa_sharing cl_khr_image2d_from_buffer cl_APPLE_ycbcr_422 cl_APPLE_rgb_422 Global memory size in MB: 4096 Size of largest memory object in MB: 1024 Global memory cache size in KB: 0 Local memory size in KB: 48 Constant memory size in KB: 64 Parallel compute units: 8 Clock frequency in MHz: 648 Max number of threads per block: 1024 Max number of threads in each dimension: 1024 1024 64

I've tried installing the NVIDIA Web Drivers and CUDA drivers for my GPU to see if that would make a difference, but without any luck. I'm running macOS 10.13.4 (High Sierra). Any idea what's going on?

Thanks in advance!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/wanderine/BROCCOLI/issues/48, or mute the thread https://github.com/notifications/unsubscribe-auth/AEGryFF2cZFlyJV1pB62J8ijEeZynHhkks5tnL7cgaJpZM4TOWT6 .

-- Anders Eklund, PhD

a-hurst commented 6 years ago

Hmm, that's as I suspected. Perhaps Apple's implementation (or Nvidia's macOS implementation, it worked fine on my CPU and I haven't tested an AMD card yet) is extra-picky about things that the Linux and Windows implementations are more forgiving of. Since I'd prefer not to dual-boot for something like this, I'll try creating an account on those Nvidia developer forums where the guy reported the same issue and see if I get anywhere. I'll also try scrounging around some OpenCL IRC channels to see if anyone has thoughts on this, and report back here once I have time.