Open smjohnson-bi opened 5 years ago
I don't know what type of hardware you are running on. OpenCL, could very well be running on your CPU's GPU-area, and it doesn't always provide much greater computing performance than the CPU, depending on the task. You running out of CL-resources hints at this. Did you try with reducing it by less, say 2048?
Nothing above 128 seemed to work and I've tried 3 different machines. Are the platform id and device id parameters significant? Also I compiled the code with OPENCL_FOUND and CL_VERSION_1_1 as pre-processor defines but there are others used in the code e.g. OCL_SOURCE_FROM_FILE, _OPENMP, USE_SSE. Which would you recommend?
Hi smjonson, I haven't looked into opencl with VHACD myself yet, so I don't have any specific knowledge for you (unless I sit down and investigate closer). But for completeness of information (if someone else has a look at the issue) I think you should post a copy of the output from clinfo.
Windows: Open cmd > clinfo
Ok thanks, here is my openCL configuration:
Number of platforms: 1 Platform Profile: FULL_PROFILE Platform Version: OpenCL 2.1 WINDOWS Platform Name: Intel(R) CPU Runtime for OpenCL(TM) Applications Platform Vendor: Intel(R) Corporation Platform Extensions: cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_3d_image_writes cl_intel_exec_by_local_thread cl_khr_spir cl_khr_dx9_media_sharing cl_intel_dx9_media_sharing cl_khr_d3d11_sharing cl_khr_gl_sharing cl_khr_fp64 cl_khr_image2d_from_buffer cl_intel_vec_len_hint
Platform Name: Intel(R) CPU Runtime for OpenCL(TM) Applications
Number of devices: 1
Device Type: CL_DEVICE_TYPE_CPU
Device ID: 32902
Max compute units: 6
Max work items dimensions: 3
Max work items[0]: 8192
Max work items[1]: 8192
Max work items[2]: 8192
Max work group size: 8192
Preferred vector width char: 1
Preferred vector width short: 1
Preferred vector width int: 1
Preferred vector width long: 1
Preferred vector width float: 1
Preferred vector width double: 1
Max clock frequency: 3700Mhz
Address bits: 14757395255531667488
Max memory allocation: 536838144
Image support: Yes
Max number of images read arguments: 480
Max number of images write arguments: 480
Max image 2D width: 16384
Max image 2D height: 16384
Max image 3D width: 2048
Max image 3D height: 2048
Max image 3D depth: 2048
Max samplers within kernel: 480
Max size of kernel argument: 3840
Alignment (bits) of base address: 1024
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: Yes
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: No
Round to +ve and infinity: No
IEEE754-2008 fused multiply-add: No
Cache type: Read/Write
Cache line size: 64
Cache size: 262144
Global memory size: 536838144
Constant buffer size: 131072
Max number of constant args: 480
Local memory type: Global
Local memory size: 32768
Error correction support: 0
Profiling timer resolution: 100
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: Yes
Queue properties:
Out-of-Order: Yes
Profiling : Yes
Platform ID: 00031F0C
Name: Intel(R) Core(TM) i5-9600K CPU @ 3.70GHz
Vendor: Intel(R) Corporation
Driver version: 18.1.0.0920
Profile: FULL_PROFILE
Version: OpenCL 2.1 (Build 0)
Extensions: cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_3d_image_writes cl_intel_exec_by_local_thread cl_khr_spir cl_khr_dx9_media_sharing cl_intel_dx9_media_sharing cl_khr_d3d11_sharing cl_khr_gl_sharing cl_khr_fp64 cl_khr_image2d_from_buffer cl_intel_vec_len_hint
I built a project with the latest V-HACD code and was try to speed up Hull generation by using the oclAcceleration option. However, the call to clEnqueueNDRangeKernel in VHACD.cpp returns CL_OUT_OF_RESOURCES.
I googled this problem and a post suggested reducing the local work size parameter. I reduced it from 4096 to 128 and then the call was successful ( no larger values would work ).
Running with this value however did not improve performance over not running GPU acceleration.