Closed ontologiae closed 9 years ago
If I modify Spoc_kernel.cl L77 __global int count) --> int count)
It compiles, but Vec_add says number are false :
./VecAdd_MultiGPU.byte Wow 4 compatible devices found Will use devices : GeForce GT 650M and Intel(R) Core(TM) i7-3615QM CPU @ 2.30GHz Size of vectors : 1024 Allocating Vectors (on CPU memory) Set auto-transfers false Loading Vectors with random floats Transfering Vectors (on Device memory) Computing OpenCL Build Warning : Compiler build log:
Mac Os 10.9 (Maverick) Macbook Pro 2012
Several problems :
1) Machine configuration
$ nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2014 NVIDIA Corporation Built on Thu_Jul_17_19:13:24_CDT_2014 Cuda compilation tools, release 6.5, V6.5.12
$ ./DeviceQuery.byte DeviceQuery This application prints informations about every device compatible with Spoc found on your computer. Found 4 devices: * 1 Cuda devices * 3 OpenCL devices
Devices Info: Device : 0 Name : GeForce GT 650M Total Global Memory : 536543232 Local Memory Size : 49152 Clock Rate : 405000 Total Constant Memory : 65536 Multi Processor Count : 2 ECC Enabled : false Powered by Cuda Driver Version 6 Cuda 3.0 compatible Regs Per Block : 65536 Warp Size : 32 Memory Pitch : 2147483647 Max Threads Per Block : 1024 Max Threads Dim : 1024x1024x64 Max Grid Size : 2147483647x65535x65535 Texture Alignment : 512 Device Overlap : true Kernel Exec Timeout Enabled : true Integrated : false Can Map Host Memory : true Compute Mode : 0 Concurrent Kernels : true PCI Bus ID : 1 PCI Device ID : 0 Device : 1 Name : Intel(R) Core(TM) i7-3615QM CPU @ 2.30GHz Total Global Memory : 17179869184 Local Memory Size : 32768 Clock Rate : 2300 Total Constant Memory : 65536 Multi Processor Count : 8 ECC Enabled : false Powered by OpenCL OpenCL compatible (via Platform : Apple) Platform Profile : FULL_PROFILE Platform Version : OpenCL 1.2 (Jul 29 2014 21:24:39) Platform Vendor : Apple Platform Extensions : cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event Platform Number of Devices : 3 Type : CPU Profile : FULL_PROFILE Version : OpenCL 1.2 Vendor : Intel Driver : 1.1 Extensions : cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_image2d_from_buffer cl_APPLE_fp64_basic_ops cl_APPLE_fixed_alpha_channel_orders cl_APPLE_biased_fixed_point_image_formats cl_APPLE_command_queue_priority Vendor ID : 4294967295 Max Work Iem Dimensions : 3 Max Work Group Size : 1024 Max Work Item Size : 1024x1x1 Address Bits : 64 Max Memory Alloc Size : 64 Image Support : true Max Read Image Args : 128 Max Write Image Args : 8 Max Samplers : 16 Memory Base Addr Align : 1024 Min Data Type Align Size : 128 Global Mem Cacheline Size : 6291456 Global Mem Cache Size : 6291456 Max Constant Args : 8 Endian Little : true Available : true Compiler Available : true CL Device Single FP Config : FP FMA CL Device Double FP Config : FP FMA CL Device Half FP Config : FP NONE CL Device Global Mem Cache Type : READ WRITE CACHE CL Device Queue Properties : PROFILING ENABLE CL Local Mem Type : Global Image2D DIM : 8192x8192 Image3D DIM : 2048x2048x2048 Preferred Vector Width Char : 16 Preferred Vector Width Short : 8 Preferred Vector Width Int : 4 Preferred Vector Width Long : 2 Preferred Vector Width Float : 4 Preferred Vector Width Double : 2 Profiling Timer Resolution : 1 Device : 2 Name : HD Graphics 4000 Total Global Memory : 1073741824 Local Memory Size : 65536 Clock Rate : 1200 Total Constant Memory : 65536 Multi Processor Count : 16 ECC Enabled : false Powered by OpenCL OpenCL compatible (via Platform : Apple) Platform Profile : FULL_PROFILE Platform Version : OpenCL 1.2 (Jul 29 2014 21:24:39) Platform Vendor : Apple Platform Extensions : cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event Platform Number of Devices : 3 Type : GPU Profile : FULL_PROFILE Version : OpenCL 1.2 Vendor : Intel Driver : 1.2(Aug 17 2014 20:29:07) Extensions : cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_image2d_from_buffer cl_khr_gl_depth_images cl_khr_depth_images Vendor ID : 16925696 Max Work Iem Dimensions : 3 Max Work Group Size : 512 Max Work Item Size : 512x512x512 Address Bits : 64 Max Memory Alloc Size : 64 Image Support : true Max Read Image Args : 128 Max Write Image Args : 8 Max Samplers : 16 Memory Base Addr Align : 1024 Min Data Type Align Size : 128 Global Mem Cacheline Size : 0 Global Mem Cache Size : 0 Max Constant Args : 8 Endian Little : true Available : true Compiler Available : true CL Device Single FP Config : FP ROUND TO INF CL Device Double FP Config : FP NONE CL Device Half FP Config : FP NONE CL Device Global Mem Cache Type : NONE CL Device Queue Properties : PROFILING ENABLE CL Local Mem Type : Local Image2D DIM : 16384x16384 Image3D DIM : 2048x2048x2048 Preferred Vector Width Char : 1 Preferred Vector Width Short : 1 Preferred Vector Width Int : 1 Preferred Vector Width Long : 1 Preferred Vector Width Float : 1 Preferred Vector Width Double : 0 Profiling Timer Resolution : 80 Device : 3 Name : GeForce GT 650M Total Global Memory : 536870912 Local Memory Size : 49152 Clock Rate : 774 Total Constant Memory : 65536 Multi Processor Count : 2 ECC Enabled : false Powered by OpenCL OpenCL compatible (via Platform : Apple) Platform Profile : FULL_PROFILE Platform Version : OpenCL 1.2 (Jul 29 2014 21:24:39) Platform Vendor : Apple Platform Extensions : cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event Platform Number of Devices : 3 Type : GPU Profile : FULL_PROFILE Version : OpenCL 1.2 Vendor : NVIDIA Driver : 8.26.28 310.40.55b01 Extensions : cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_APPLE_fp64_basic_ops cl_khr_fp64 cl_khr_3d_image_writes cl_khr_depth_images cl_khr_gl_depth_images cl_khr_gl_msaa_sharing cl_khr_image2d_from_buffer Vendor ID : 16918272 Max Work Iem Dimensions : 3 Max Work Group Size : 1024 Max Work Item Size : 1024x1024x64 Address Bits : 32 Max Memory Alloc Size : 32 Image Support : true Max Read Image Args : 256 Max Write Image Args : 16 Max Samplers : 32 Memory Base Addr Align : 1024 Min Data Type Align Size : 128 Global Mem Cacheline Size : 0 Global Mem Cache Size : 0 Max Constant Args : 9 Endian Little : true Available : true Compiler Available : true CL Device Single FP Config : FP ROUND TO INF CL Device Double FP Config : FP FMA CL Device Half FP Config : FP NONE CL Device Global Mem Cache Type : NONE CL Device Queue Properties : PROFILING ENABLE CL Local Mem Type : Local Image2D DIM : 16384x16384 Image3D DIM : 2048x2048x2048 Preferred Vector Width Char : 1 Preferred Vector Width Short : 1 Preferred Vector Width Int : 1 Preferred Vector Width Long : 1 Preferred Vector Width Float : 1 Preferred Vector Width Double : 1 Profiling Timer Resolution : 1000 !!Warning!! could be Device 2
2)
|~/Documents/Projets/NEURAL/Etudes/GPU/SPOC/Samples/build/Bytecode master| $ ./VecAdd.byte Will use device : GeForce GT 650M Size of vectors : 1024 Will use double precision Allocating Vectors (on CPU memory) Loading Vectors with random floats Transfering Vectors (on Device memory) Computing ptxas application ptx input, line 255; warning : Double is not supported. Demoting to float IN: spoc_cuda_launch_grid@ 409 Fatal error: exception Cuda.ERROR_LAUNCH_INCOMPATIBLE_TEXTURING
|~/Documents/Projets/NEURAL/Etudes/GPU/SPOC/Samples/build/Bytecode master| $ ./VecAdd.byte -auto true Will use device : GeForce GT 650M Size of vectors : 1024 Will use double precision Allocating Vectors (on CPU memory) Loading Vectors with random floats Computing
IN: spoc_cuda_launch_grid@ 409 Fatal error: exception Cuda.ERROR_LAUNCH_INCOMPATIBLE_TEXTURING |~/Documents/Projets/NEURAL/Etudes/GPU/SPOC/Samples/build/Bytecode master| $ ./VecAdd.byte -device 1 -auto true Will use device : Intel(R) Core(TM) i7-3615QM CPU @ 2.30GHz Size of vectors : 1024 Will use double precision Allocating Vectors (on CPU memory) Loading Vectors with random floats Computing [CL_DEVICE_NOT_AVAILABLE] : OpenCL Error : Error: build program driver returned (-2) OpenCL Warning : clBuildProgram failed: could not build program for 0xffffffff (Intel(R) Core(TM) i7-3615QM CPU @ 2.30GHz) (err:-2) [CL_BUILD_ERROR] : OpenCL Build Error : Compiler build log: