hughperkins / cltorch

An OpenCL backend for torch.
Other
289 stars 26 forks source link

cltorch.test() fails at test_blas #30

Closed jjgoings closed 8 years ago

jjgoings commented 8 years ago

Running luajit -l cltorch -e 'cltorch.test()' returns Segmentation fault: 11 after test_blas. Prior tests pass.

Not sure how to dig deeper into the tests to see why it fails, could be some environmental variable not set considering some of the other issues raised for Mac OS.

Built from commit 29d0891

I went back several commits, e.g. 424b01e5954aeb, and those just fail at the assertions unit tests, FWIW.

ProductName: Mac OS X ProductVersion: 10.11.1 BuildVersion: 15B42

Darwin 15.0.0 Darwin Kernel Version 15.0.0: Sat Sep 19 15:53:46 PDT 2015; root:xnu->3247.10.11~1/RELEASE_X86_64 x86_64

Using Apple , OpenCL platform: Apple Using OpenCL device: Radeon HD 4850

Output of clinfo:

Number of platforms 1 Platform Name Apple Platform Vendor Apple Platform Version OpenCL 1.2 (Sep 21 2015 19:24:11) Platform Profile FULL_PROFILE Platform Extensions cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event

Platform Name Apple Number of devices 2 Device Name Intel(R) Core(TM) i5 CPU 750 @ 2.67GHz Device Vendor Intel Device Vendor ID 0xffffffff Device Version OpenCL 1.2 Driver Version 1.1 Device OpenCL C Version OpenCL C 1.2 Device Type CPU Device Profile FULL_PROFILE Max compute units 4 Max clock frequency 2660MHz Device Partition (core) Max number of sub-devices 0 Supported partition types None Max work item dimensions 3 Max work item sizes 1024x1x1 Max work group size 1024 Preferred work group size multiple 1 Preferred / native vector sizes
char 16 / 16
short 8 / 8
int 4 / 4
long 2 / 2
half 0 / 0 (n/a) float 4 / 4
double 2 / 2 (cl_khr_fp64) Half-precision Floating-point support (n/a) Single-precision Floating-point support (core) Denormals Yes Infinity and NANs Yes Round to nearest Yes Round to zero Yes Round to infinity Yes IEEE754-2008 fused multiply-add Yes Support is emulated in software No Correctly-rounded divide and sqrt operations Yes Double-precision Floating-point support (cl_khr_fp64) Denormals Yes Infinity and NANs Yes Round to nearest Yes Round to zero Yes Round to infinity Yes IEEE754-2008 fused multiply-add Yes Support is emulated in software No Correctly-rounded divide and sqrt operations No Address bits 64, Little-Endian Global memory size 17179869184 (16GiB) Error Correction support No Max memory allocation 4294967296 (4GiB) Unified memory for Host and Device Yes Minimum alignment for any data type 128 bytes Alignment of base address 1024 bits (128 bytes) Global Memory cache type Read/Write Global Memory cache size 64 Global Memory cache line 8388608 bytes Image support Yes Max number of samplers per kernel 16 Max size for 1D images from buffer 65536 pixels Max 1D or 2D image array size 2048 images Base address alignment for 2D image buffers 1 bytes Pitch alignment for 2D image buffers 1 bytes Max 2D image size 8192x8192 pixels Max 3D image size 2048x2048x2048 pixels Max number of read image args 128 Max number of write image args 8 Local memory type Global Local memory size 32768 (32KiB) Max constant buffer size 65536 (64KiB) Max number of constant args 8 Max size of kernel argument 4096 (4KiB) Queue properties
Out-of-order execution No Profiling Yes Profiling timer resolution 1ns Execution capabilities
Run OpenCL kernels Yes Run native kernels Yes Prefer user sync for interop Yes printf() buffer size 1048576 (1024KiB) Built-in kernels
Device Available Yes Compiler Available Yes Linker Available Yes Device Extensions cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_image2d_from_buffer cl_APPLE_fp64_basic_ops cl_APPLE_fixed_alpha_channel_orders cl_APPLE_biased_fixed_point_image_formats cl_APPLE_command_queue_priority

Device Name Radeon HD 4850 Device Vendor AMD Device Vendor ID 0x1021a00 Device Version OpenCL 1.0 Driver Version 1.0 Device OpenCL C Version OpenCL C 1.0 Device Type GPU Device Profile FULL_PROFILE Max compute units 10 Max clock frequency 503MHz Max work item dimensions 3 Max work item sizes 1024x1024x1024 Max work group size 1024 Preferred work group size multiple 64 Preferred / native vector sizes
char 16 / 16
short 8 / 8
int 4 / 4
long 2 / 2
half 0 / 0 (n/a) float 4 / 4
double 0 / 0 (n/a) Half-precision Floating-point support (n/a) Single-precision Floating-point support (core) Denormals No Infinity and NANs Yes Round to nearest Yes Round to zero Yes Round to infinity Yes IEEE754-2008 fused multiply-add No Support is emulated in software No Correctly-rounded divide and sqrt operations No Double-precision Floating-point support (n/a) Address bits 32, Little-Endian Global memory size 402653184 (384MiB) Error Correction support No Max memory allocation 134217728 (128MiB) Unified memory for Host and Device No Minimum alignment for any data type 128 bytes Alignment of base address 32768 bits (4096 bytes) Global Memory cache type None Image support No Local memory type Local Local memory size 16384 (16KiB) Max constant buffer size 65536 (64KiB) Max number of constant args 8 Max size of kernel argument 1024 Queue properties
Out-of-order execution No Profiling Yes Profiling timer resolution 40ns Execution capabilities
Run OpenCL kernels Yes Run native kernels No Device Available Yes Compiler Available Yes Device Extensions cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event

NULL platform behavior clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) Apple clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) Success [P0] clCreateContext(NULL, ...) [default] Success [P0] clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) Success (1) Platform Name Apple Device Name Intel(R) Core(TM) i5 CPU 750 @ 2.67GHz clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) Success (1) Platform Name Apple Device Name Radeon HD 4850 clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) <checkNullCtxFromType:2073: create context from type CL_DEVICE_TYPE_CUSTOM : error -30> clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) Success (2) Platform Name Apple Device Name Radeon HD 4850 Device Name Intel(R) Core(TM) i5 CPU 750 @ 2.67GHz

hughperkins commented 8 years ago

Hi Joshua, it looks like your GPU is OpenCL 1.0, but cltorch only supports OpenCL 1.1 or higher.

jjgoings commented 8 years ago

Okay, thanks. Seeing as I cannot figure out how to update OpenCL on my Mac, I'll consider this closed.