inducer / pyopencl

OpenCL integration for Python, plus shiny features
http://mathema.tician.de/software/pyopencl
Other
1.07k stars 242 forks source link

Image not initialized on GPU #288

Open Nan2018 opened 5 years ago

Nan2018 commented 5 years ago

It seems sometimes image is not correctly initialized on NVIDA GPU on windows.

import numpy as np
import pyopencl as cl
from pyopencl import cltypes

platform = cl.get_platforms()[0]
print(platform, platform.version)

device = platform.get_devices()[0]
print(
    device,
    "image support: {}".format(device.get_info(cl.device_info.IMAGE_SUPPORT)),
)

ctx = cl.Context([device])
print(ctx)

cmd_queue = cl.CommandQueue(ctx)

image_data = np.arange(0.0, 10.0)

image_device = cl.Image(
    ctx,
    cl.mem_flags.READ_ONLY | cl.mem_flags.COPY_HOST_PTR,
    cl.ImageFormat(cl.channel_order.RG, cl.channel_type.FLOAT),
    hostbuf=np.stack((image_data, image_data), axis=1).astype(cltypes.float),
)

output_array = np.empty((10,), dtype=cltypes.float)
output_device = cl.Buffer(
    ctx, cl.mem_flags.READ_WRITE, 10 * np.dtype(cltypes.float).itemsize
)

kernel_string = """
__kernel void test(__read_only image1d_t img, __global float * output){
    sampler_t img_sampler = CLK_NORMALIZED_COORDS_FALSE |
                            CLK_ADDRESS_CLAMP_TO_EDGE | CLK_FILTER_LINEAR;
    int i = get_global_id(0);
    float x = 0.5f + i;
    output[i] = read_imagef(img, img_sampler, x).x;
}
"""
program = cl.Program(ctx, kernel_string).build(
    "-cl-single-precision-constant -I."
)
program.test(cmd_queue, (10,), None, image_device, output_device)
cl.enqueue_copy(cmd_queue, output_array, output_device)
print(output_array)

Output:

<pyopencl.Platform 'NVIDIA CUDA' at 0x2da1b7114c0> OpenCL 1.2 CUDA 10.2.120
<pyopencl.Device 'GeForce GTX 1080 Ti' on 'NVIDIA CUDA' at 0x2da1b7118d0> image support: 1
<pyopencl.Context at 0x2da18593670 on <pyopencl.Device 'GeForce GTX 1080 Ti' on 'NVIDIA CUDA' at 0x2da1b7118d0>>
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]

However, if I choose Intel platform and CPU as device, the output is correct:

<pyopencl.Platform 'Intel(R) OpenCL' at 0x24f397a87d0> OpenCL 2.1
<pyopencl.Device 'Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz' on 'Intel(R) OpenCL' at 0x24f397e22e0> image support: 1
<pyopencl.Context at 0x24f39fadbc0 on <pyopencl.Device 'Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz' on 'Intel(R) OpenCL' at 0x24f397e22e0>>
C:\Users\nanqin\Miniconda3\envs\opencl\lib\site-packages\pyopencl\__init__.py:235: CompilerWarning: Non-empty compiler output encountered. Set the environment variable PYOPENCL_COMPILER_OUTPUT=1 to see more.
  "to see more.", CompilerWarning)
[0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]

OS: windows10

I tested with AMD GPU on Mac and the output is correct

<pyopencl.Platform 'Apple' at 0x7fad2a020890> OpenCL 1.2 (Feb 22 2019 20:16:07)
<pyopencl.Device 'AMD Radeon Pro 555X Compute Engine' on 'Apple' at 0x7fad2a1031b0> image support: 1
<pyopencl.Context at 0x7fad2a1016a0 on <pyopencl.Device 'AMD Radeon Pro 555X Compute Engine' on 'Apple' at 0x7fad2a1031b0>>
[0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]
Nan2018 commented 5 years ago

Platform and device info:

================
  Platform # 1
================

Platform name           :   NVIDIA CUDA
OpenCL version          :   OpenCL 1.2 CUDA 10.2.120
Platform vendor         :   NVIDIA Corporation
OpenCL profile          :   FULL_PROFILE
Extensions              :
                        :   cl_khr_global_int32_base_atomics
                        :   cl_khr_global_int32_extended_atomics
                        :   cl_khr_local_int32_base_atomics
                        :   cl_khr_local_int32_extended_atomics
                        :   cl_khr_fp64
                        :   cl_khr_byte_addressable_store
                        :   cl_khr_icd
                        :   cl_khr_gl_sharing
                        :   cl_nv_compiler_options
                        :   cl_nv_device_attribute_query
                        :   cl_nv_pragma_unroll
                        :   cl_nv_d3d10_sharing
                        :   cl_khr_d3d10_sharing
                        :   cl_nv_d3d11_sharing
                        :   cl_nv_copy_opts
                        :   cl_nv_create_buffer
                        :   
Device(s)               :   1

----------------
  Device # 1
----------------

Device name                                         :   GeForce GTX 1080 Ti
OpenCL device type                                  :   GPU
Vendor name                                         :   NVIDIA Corporation
OpenCL version                                      :   OpenCL 1.2 CUDA
Device vendor identifier                            :   4318
OpenCL software driver version                      :   430.86

Maximum number of samplers                          :   32
Maximum number of work-items in a work-group        :   1024
Maximum dimensions that specify work-item IDs       :   3
Maximum number of work-items in each dimension      :   1024, 1024, 64
Address space size                                  :   32

Type of local memory                                :   Local memory storage
Size of local memory arena (in bytes)               :   49152
Type of global memory cache                         :   Read-Write cache
Size of global memory cache (in bytes)              :   458752
Size of global memory cache line (in bytes)         :   128
Size of global device memory (in bytes)             :   11811160064

Device is available                                 :   Yes
Compiler is available                               :   Yes
Little endian device                                :   Yes
Error correction support                            :   No
Images are supported                                :   Yes

Max width of 2D image (in pixels)                   :   16384
Max height of 2D image (in pixels)                  :   32768
Max width of 3D image (in pixels)                   :   16384
Max height of 3D image (in pixels)                  :   16384
Max depth of 3D image (in pixels)                   :   16384

Resolution of device timer (in nanoseconds)         :   1000
Maximum configured clock frequency (in MHz)         :   1582
The number of parallel compute cores                :   28
Max number of __constant arguments in a kernel      :   9
Max size of a constant buffer allocation (in bytes) :   65536
Max size of memory object allocation (in bytes)     :   2952790016
Max size of kernel arguments (in bytes)             :   4352
Max number of simultaneously read image objects     :   256
Max number of simultaneously written image objects  :   16
Alignment of the base address (in bits)             :   4096
Minimum alignment for any data type (in bytes)      :   128

Preferred native vector width size for char type    :   1
Preferred native vector width size for short type   :   1
Preferred native vector width size for int type     :   1
Preferred native vector width size for long type    :   1
Preferred native vector width size for float type   :   1
Preferred native vector width size for double type  :   1

Single precision floating-point capability          :
                                                    :   denorms are supported
                                                    :   INF and NaNs are supported
                                                    :   round to nearest even rounding mode supported
                                                    :   round to zero rounding mode supported
                                                    :   round to +ve and -ve infinity rounding modes supported
                                                    :   IEEE754-2008 fused multiply-add is supported
Double precision fp capability                      :
                                                    :   denorms are supported
                                                    :   INF and NaNs are supported
                                                    :   round to nearest even rounding mode supported
                                                    :   round to zero rounding mode supported
                                                    :   round to +ve and -ve infinity rounding modes supported
                                                    :   IEEE754-2008 fused multiply-add is supported
Half precision fp capability                        :
                                                    :   round to nearest even rounding mode supported
                                                    :   round to zero rounding mode supported
Execution capabilities                              :
                                                    :   The OpenCL device can execute OpenCL kernels
Supported command-queue properties                  :   Commands are executed out-of-order;The profiling of commands is enabled
Extensions                                          :
                                                    :   cl_khr_global_int32_base_atomics
                                                    :   cl_khr_global_int32_extended_atomics
                                                    :   cl_khr_local_int32_base_atomics
                                                    :   cl_khr_local_int32_extended_atomics
                                                    :   cl_khr_fp64
                                                    :   cl_khr_byte_addressable_store
                                                    :   cl_khr_icd
                                                    :   cl_khr_gl_sharing
                                                    :   cl_nv_compiler_options
                                                    :   cl_nv_device_attribute_query
                                                    :   cl_nv_pragma_unroll
                                                    :   cl_nv_d3d10_sharing
                                                    :   cl_khr_d3d10_sharing
                                                    :   cl_nv_d3d11_sharing
                                                    :   cl_nv_copy_opts
                                                    :   cl_nv_create_buffer
Nan2018 commented 5 years ago

According to NVIDA CUDA download page, CUDA 10 is the latest version.

subshall commented 5 years ago

~~Hello, my GPU is integrated graphics, intel HD 515 My CPU is Intel(R) Core(TM) m3-6Y30 CPU @ 0.90GHz~~

I am using clinfo, the output is 

Platform Name Intel(R) OpenCL Number of devices 2 Device Name Intel(R) Gen9 HD Graphics NEO Device Vendor Intel(R) Corporation Device Vendor ID 0x8086 Device Version OpenCL 2.1 NEO Driver Version 18.28.11080 Device OpenCL C Version OpenCL C 2.0 Device Type GPU Device Profile FULL_PROFILE Max compute units 24 Max clock frequency 850MHz Device Partition (core) Max number of sub-devices 0 Supported partition types None Max work item dimensions 3 Max work item sizes 256x256x256

~~But when I print the device with pyopencl, I get the following result <pyopencl.Device 'Intel(R) Core(TM) m3-6Y30 CPU @ 0.90GHz' on 'Intel(R) CPU Runtime for OpenCL(TM) Applications' at 0x1a4b0c8> image support: 1~~

only CPU

I want to know how to select GPU in pyopencl thanks