intel / compute-runtime

Intel® Graphics Compute Runtime for oneAPI Level Zero and OpenCL™ Driver
MIT License
1.12k stars 229 forks source link

Documentation about simultaneous use of CPU and IGP #229

Closed OliverScherf closed 4 years ago

OliverScherf commented 4 years ago

I'm working with an Intel Core processor, which has an IGP. I wanted to use OpenCL to use both the CPU and the IGP. I thought and I expected that I can use them simultaneously using OpenCL.

The documentation about this is confusing. The first article I found was this one: https://software.intel.com/en-us/iocl-opg-using-shared-context-for-multiple-opencl-devices

It links to an example which is supposed to use both devices using a shared context. There is a link to download that project for Linux, yet it also says it is only supported for Windows: https://software.intel.com/en-us/articles/hdr-tone-mapping-multi-device

At the driver page of Intel® Graphics Technology Runtimes it says: "Runtimes for Intel® Graphics Technology are often deployed in tandem with an Intel® CPU runtime." - This also sounds like both devices can be used to work cooperative using OpenCL: https://software.intel.com/en-us/articles/opencl-drivers

The OpenCL SDK overview states: "The OpenCL™ platform is the open standard for general-purpose parallel programming of heterogeneous systems. It provides a uniform programming environment that's used to write portable code for client PCs, high-performance computing servers, and embedded systems that leverage a diverse mix of: Multicore CPUs, Graphic processors, FPGAs, parallel processors and coprocessors" Which also sounds like simultaneous usage. https://software.intel.com/en-us/opencl-sdk

After a lot of time researching and checking my OpenCL installation I finally read in the LIMITATIONS.MD: "Creation of OpenCL context spanning both CPU and GPU devices is currently not supported"

I think the documentation on the Intel websites is misleading. This limitation should be stated out more clearly, especially on the OpenCL landing pages.

PiotrRozenfeld commented 4 years ago

As of today, shared context is possible on Windows (OpenCL CPU and GPU devices are provided under one OpenCL platform), but not on Linux (OpenCL CPU and GPU are provided separately).

The difference in documentation is related to the fact that software.intel.com covers more devices and drivers while scope of github project documentation is limited to OpenCL GPU implementation.

Would you share more details about your use case? We may be able to recommend an alternative approach that does not require shared context.

OliverScherf commented 4 years ago

Thanks for the reply! That clears some confusion.

I want to write a program, where the CPU and the IGP works both cooperative and parallel on the same task. Both devices should compute a part of the result which is eventually merged.

My current approach is to implement the CPU part in plain C++ and use OpenCL for the IGP. Important to me is the zero copy behavior between CPU and IGP, which I go working by now.

I have both a CPU and IGP OpenCL platform installed, can these distinct platform cooperate in the same program?

MichalMrozek commented 4 years ago

If you have 2 platforms, you cannot have context utilizing those devices. Shared context is only possible for devices within the same platform.

You can however use those 2 platforms with your program, but to share anything you would need to do that manually.

If you want to use C++ for you CPU part, I think you may do the following:

  1. clCreateBuffer( ...CL_MEM_ALLOC_HOST_PTR...);

  2. Now partition the buffer for the CPU/GPU side using clCreateSubBuffer, I suggest here to use origin that is page aligned ( 4096u bytes )

  3. use GPU part of the buffer on the GPU

  4. for the CPU access , map using clEnqueueMapBuffer and use it on the CPU side

  5. Make sure you do not touch the same memory as it may not be coherent

  6. If you want to upload data to Buffer release CPU sub buffer ownership with clEnqueueUnmapMemObject and synchronize.

OliverScherf commented 4 years ago

Thanks for the suggestion. I will try that.

I just wonder, why I shouldn't touch the same memory with both devices. Would there occur problems if I finish a GPU operation, such that no memory access is performed anymore and then use the CPU to read from the exact same address?

Platform is the Computer Runtime (GPU only.)

  1. clCreateBuffer( ...CL_MEM_ALLOC_HOST_PTR, mem_ptr);
  2. clEnqueueNDRangeKernel(...)
  3. synchronization (either of these): 3.1. clFinish() 3.2. synchronize such that all kernels that use mem_ptr have finished computing
  4. algorithmCPU(mem_ptr); (maybe go to 2. again)
MichalMrozek commented 4 years ago

The CPU and GPU caches are not coherent unless you enable fine-grained SVM coherency which has associated cost as well. It means that memory may be corrupted if the same memory locations are utilized from both devices.

In your above example it should be fine as you synchronize the usage of all devices, so there is no concurrent access to memory from multiple devices.

Btw to get CPU access to memory you need to use clEnqueueMapBuffer call, it has blocking flag as well which would guarantee that GPU is done with memory.

AdamCetnerowski commented 4 years ago

Looks like question has been answered. If you have follow-ups, please re-open or create new issue.