kpet / clvk

Implementation of OpenCL 3.0 on Vulkan
Apache License 2.0
353 stars 39 forks source link
gpu-computing opencl vulkan vulkan-api

clvk CI badge Discord Shield

clvk is a conformant implementation of OpenCL 3.0 on top of Vulkan using clspv as the compiler.

OpenCL Logo OpenCL Logo

Supported applications

Full list

Getting dependencies

clvk depends on the following external projects:

clvk also (obviously) depends on a Vulkan implementation. The build system supports a number of options there (see Building section).

To fetch all the dependencies needed to build and run clvk, please run:

git submodule update --init --recursive
./external/clspv/utils/fetch_sources.py --deps llvm

Building

clvk uses CMake for its build system.

Getting started

To build with the default configuration options, just use following:

mkdir -p build
cd build
cmake ../
make -j$(nproc)

Build options

The build system allows a number of things to be configured.

Vulkan implementation

You can select the Vulkan implementation that clvk will target with the CLVK_VULKAN_IMPLEMENTATION build system option. Two options are currently supported:

Tests

It is possible to disable the build of the tests by passing -DCLVK_BUILD_TESTS=OFF.

It is also possible to disable only the build of the tests linking with the static OpenCL library by passing -DCLVK_BUILD_STATIC_TESTS=OFF.

By default, tests needing gtest are linked with the libraries coming from llvm (through clspv). It is possible to use other libraries by passing -DCLVK_GTEST_LIBRARIES=<lib1>;<lib2> (semicolumn separated list).

Assertions

Assertions can be controlled with the CLVK_ENABLE_ASSERTIONS build option. They are enabled by default in Debug builds and disabled in other build types.

OpenCL conformance tests

Passing -DCLVK_BUILD_CONFORMANCE_TESTS=ON will instruct CMake to build the OpenCL conformance tests. This is not expected to work out-of-the box at the moment.

It is also possible to build GL and GLES interroperability tests by passing -DCLVK_BUILD_CONFORMANCE_TESTS_GL_GLES_SUPPORTED=ON.

Clspv compilation

You can select the compilation style that clvk will use with Clspv via the CLVK_CLSPV_ONLINE_COMPILER option. By default, Clspv is run in a separate process.

You can build clvk using an external Clspv source tree by setting -DCLSPV_SOURCE_DIR=/path/to/clspv/source/.

SPIRV components

All needed SPIRV components are added to clvk using git submodules. It is possible to disable the build of those component or to reuse already existing sources:

SPIRV-Headers

SPIRV_HEADERS_SOURCE_DIR can be overriden to use another SPIRV-Headers repository.

SPIRV-Tools

SPIRV_TOOLS_SOURCE_DIR can be overriden to use another SPIRV-Tools repository. You can also disable the build of SPIRV-Tools by setting -DCLVK_BUILD_SPIRV_TOOLS=OFF.

SPIRV-LLVM-Translator

LLVM_SPIRV_SOURCE can be overriden to use another SPIRV-LLVM-Translator repository. Note that it is not used if the compiler support is disabled (enabled by default).

Sanitizers

Support for sanitizers is integrated into the build system:

Cross-compiling

When cross-compiling clvk, libclc binaries need to be compiled separately:

  1. Build a host native clang compiler using the source pointed by clspv in <clvk>/external/clspv/third_party/llvm:
    cmake -B <clang_host> -S <clvk>/external/clspv/third_party/llvm \
    -DLLVM_ENABLE_PORJECTS="clang" \
    -DLLVM_NATIVE_TARGET=1 \
    -DCMAKE_BUILD_TYPE=Release \
    -DCMAKE_INSTALL_PREFIX="<clang_host>/install"
    cmake --build <clang_host> --target install
  2. Build libclc using that compiler:
    cmake -B <libclc> -S <clvk>/external/clspv/third_party/llvm/libclc \
    -DLLVM_CMAKE_DIR="<clang_host>/install/lib/cmake" \
    -DLIBCLC_DIR_TARGETS_TO_BUILD="clspv--;clspv64--"
    cmake --build <libclc>
  3. Pass the following options to CMake when compiling clvk:
    • -DCLSPV_EXTERNAL_LIBCLC_DIR="<libclc>"

Building for Android

clvk can be built for Android using the Android NDK toolchain.

  1. Download and extract the NDK toolchain to a directory (/path/to/ndk)
  2. Build libclc binaries (cross-compiling)
  3. Pass the following options to CMake:
    • -DCMAKE_TOOLCHAIN_FILE=/path/to/ndk/build/cmake/android.toolchain.cmake
    • -DANDROID_ABI=<ABI_FOR_THE_TARGET_DEVICE>, most likely arm64-v8a
    • -DVulkan_LIBRARY=/path/to/ndk/**/<api-level>/libvulkan.so
  4. That should be it!

Using

Via the OpenCL ICD Loader

clvk supports the cl_khr_icd OpenCL extension that makes it possible to use the OpenCL ICD Loader.

Directly

To use clvk to run an OpenCL application, you just need to make sure that the clvk shared library is picked up by the dynamic linker.

When clspv is not built into the shared library (which is currently the default), you also need to make sure that clvk has access to the clspv binary. If you wish to move the built library and clspv binary out of the build tree, you will need to make sure that you provide clvk with a path to the clspv binary via the CLVK_CLSPV_PATH environment variable (see Environment variables).

Unix-like systems (Linux, macOS)

The following ought to work on Unix-like systems:

$ LD_LIBRARY_PATH=/path/to/build /path/to/application

# Running the included simple test
$ LD_LIBRARY_PATH=./build ./build/simple_test

With perfetto traces

Perfetto is a production-grade open-source stack for performance instrumentation and trace analysis. It offers services and libraries and for recording system-level and app-level traces, native + java heap profiling, a library for analyzing traces using SQL and a web-based UI to visualize and explore multi-GB traces.

-- https://github.com/google/perfetto/tree/v46.0#perfetto---system-profiling-app-tracing-and-trace-analysis

Perfetto can be enabled by passing the following options to CMake:

The perfetto SDK can be found in the Perfetto Github repository

If you already have a perfetto library in your system, you still need to provide the path to the SDK directory so the build system can find perfetto.h. But you should also provide the following option to CMake:

By default, clvk will use Perfetto's InProcess backend, which means that you just have to run your application to generate traces. Environment variables can be used to control the maximum size of traces and what file they are saved to.

If you'd rather use Perfetto's System backend, pass the following option to CMake:

Once traces have been generated, you can view them using the perfetto trace viewer.

Windows

Copy OpenCL.dll into a system location or alongside the application executable you want to run with clvk.

Raspberry Pi

Make sure you have an up-to-date Mesa installed on your system. At the time of writing (May 2023) RaspberryPi OS (Debian 11) did not, but Ubuntu 23.04 does have a compatible vulkan driver.

Install the prerequisites:

$ sudo apt install mesa-vulkan-drivers vulkan-tools libvulkan-dev git cmake clang clinfo

Check if your vulkan implementation has the VK_KHR_storage_buffer_storage_class extension or supports Vulkan 1.1. Note that it's not enough if this is the case for llvmpipe. You need v3dv to support this too. If it does not, your Mesa is too old.

To fetch the dependencies, do python3 ./external/clspv/utils/fetch_sources.py because the python interpreter may not be found if not explicitly called as python3.

Building will take many hours on a rPi4. Maybe you can skip some tool building, but building with default settings works, at least.

Once the libOpenCL.so library has been built, verify with:

LD_LIBRARY_PATH=/path/to/build clinfo

With global timing information

Global timing information about API functions as well as some internal functions can be logged at the end of the execution.

To enable it, pass the following option to CMake:

Here is an example of what to expect running simple_test:

[CLVK] 0.00 ms -> clReleaseContext (1 blocks, avg 0.001 ms)
[CLVK] 0.02 ms -> clReleaseProgram (1 blocks, avg 0.024 ms)
[CLVK] 0.01 ms -> clReleaseKernel (1 blocks, avg 0.008 ms)
[CLVK] 0.00 ms -> clReleaseCommandQueue (1 blocks, avg 0.005 ms)
[CLVK] 0.01 ms -> clReleaseMemObject (1 blocks, avg 0.007 ms)
[CLVK] 0.00 ms -> clEnqueueUnmapMemObject (1 blocks, avg 0.002 ms)
[CLVK] 0.03 ms -> clEnqueueMapBuffer (1 blocks, avg 0.034 ms)
[CLVK] 5.04 ms -> vkQueueWaitIdle (1 blocks, avg 5.043 ms)
[CLVK] 5.13 ms -> executor_wait (1 blocks, avg 5.126 ms)
[CLVK] 0.02 ms -> vkQueueSubmit (1 blocks, avg 0.022 ms)
[CLVK] 5.07 ms -> execute_cmd: CLVK_COMMAND_BATCH (3 blocks, avg 1.690 ms)
[CLVK] 5.09 ms -> execute_cmds (3 blocks, avg 1.698 ms)
[CLVK] 0.00 ms -> extract_cmds_required_by (3 blocks, avg 0.001 ms)
[CLVK] 0.00 ms -> enqueue_command (3 blocks, avg 0.001 ms)
[CLVK] 0.00 ms -> end_current_command_batch (1 blocks, avg 0.002 ms)
[CLVK] 0.12 ms -> flush_no_lock (4 blocks, avg 0.030 ms)
[CLVK] 5.20 ms -> clFinish (2 blocks, avg 2.602 ms)
[CLVK] 97.06 ms -> clEnqueueNDRangeKernel (1 blocks, avg 97.063 ms)
[CLVK] 0.00 ms -> clSetKernelArg (1 blocks, avg 0.002 ms)
[CLVK] 0.01 ms -> clCreateBuffer (1 blocks, avg 0.012 ms)
[CLVK] 0.01 ms -> clCreateCommandQueue (1 blocks, avg 0.009 ms)
[CLVK] 0.19 ms -> clCreateKernel (1 blocks, avg 0.187 ms)
[CLVK] 237.71 ms -> clBuildProgram (1 blocks, avg 237.713 ms)
[CLVK] 0.02 ms -> clCreateProgramWithSource (1 blocks, avg 0.016 ms)
[CLVK] 0.00 ms -> clCreateContext (1 blocks, avg 0.000 ms)
[CLVK] 0.00 ms -> clGetDeviceInfo (1 blocks, avg 0.001 ms)
[CLVK] 0.00 ms -> clGetDeviceIDs (1 blocks, avg 0.001 ms)
[CLVK] 0.00 ms -> clGetPlatformInfo (1 blocks, avg 0.002 ms)
[CLVK] 0.00 ms -> clGetPlatformIDs (1 blocks, avg 0.000 ms)

Tuning clvk

clvk can be tuned to improve the performance of specific workloads or on specific platforms. While we try to have the default parameters set at their best values for each platform, they can be changed for specific applications. One of the best way to know whether something can be improved is to use traces to understand what should be changed.

Group size

clvk is grouping commands and waiting for a call to clFlush or any blocking calls (clFinish, clWaitForEvents, etc.) to submit those groups for execution.

clvk's default group flushing behaviour can be controlled using the following two variables to flush groups as soon as a given number of commands have been grouped:

Batch size

clvk relies on vulkan to offload workoad to the GPU. As such, it is better to batch OpenCL commands (translated into vulkan commands) into a vulkan command buffer. But doing that may increase the latency to start running commands.

The size of those batches can be controlled using the following two variables:

Configuration

Many aspects of clvk's behaviour can be configured using configuration files and/or environment variables. clvk attempts to get its configuration from the following sources (in the order documented here). Values obtained from each source take precedence over previously obtained values.

  1. System-wide configuration in /etc/clvk.conf
  2. Configuration file in /usr/local/etc/clvk.conf
  3. Per-user configuration in ~/.config/clvk.conf
  4. clvk.conf in the current directory
  5. An additional configuration file specified using the CLVK_CONFIG_FILE environment variable, if provided
  6. Environment variables for individual configuration options

Configuration files use a key-value format and allow comments beginning with #:

# Here's a comment
option = value

other_option = 42

Options names are lowercase (e.g myoption) in configuration files but uppercase and prefixed with CLVK_ in environment variables (e.g. CLVK_MYOPTION).

Here is a list of all the configuration options that clvk supports documented using the name of the corresponding environment variable.

Limitations