jonysy / parenchyma

An extensible HPC framework for CUDA, OpenCL and native CPU.
75 stars 4 forks source link

Advanced configuration (e.g., multiple devices) #14

Open jonysy opened 7 years ago

jonysy commented 7 years ago

OpenCL reference card Porting CUDA Applications to OpenCL

OpenCL

Contexts

Current implementation allows for a single context to encapsulate a single device only.

What's possible:

All the setups above require advanced scheduling + cross device execution is quite rare. Multiple platforms can exist on a single machine. Targeting multiple platforms is fine as long as contexts do not cross - meaning, one context per platform is required. In other words, an OpenCL context can only encapsulate devices from a single platform.

Queues

At least one command queue per device is required.

What's possible:

OpenCL objects such as memory, program and kernel objects are created using a context. Operations on these objects are performed using a command-queue. The command-queue can be used to queue a set of operations (referred to as commands) in order. Having multiple command-queues allows applications to queue multiple independent commands without requiring synchronization. Note that this should work as long as these objects are not being shared. Sharing of objects across multiple command-queues will require the application to perform appropriate synchronization. This is described in Appendix A of the specification.

CUDA

Current implementation allows for a single context to encapsulate a single device only.

...

jonysy commented 7 years ago

Multi-Device Execution

Once a system has multiple devices, there are two main complications: deciding which device to place the com- putation for each node in the graph, and then managing the required communication of data across device bound- aries implied by these placement decisions.