mattkretz / vir-simd

improve the usage experience of std::experimental::simd (Parallelism TS 2)
GNU Lesser General Public License v3.0
22 stars 2 forks source link
cpp cpp17-library parallelism-ts simd simd-library

vir::stdx::simd

Conan Center CI DOI OpenSSF Best Practices fair-software.eu

This project aims to provide a fallback std::experimental::simd (Parallelism TS 2) implementation with additional features. Not every user can rely on GCC 11+ and its standard library to be present on all target systems. Therefore, the header vir/simd.h provides a fallback implementation of the TS specification that only implements the scalar and fixed_size<N> ABI tags. Thus, your code can still compile and run correctly, even if it is missing the performance gains a proper implementation provides.

Table of Contents

Installation

This is a header-only library. Installation is a simple copy of the headers to wherever you want them. Per default make install copies the headers into /usr/local/include/vir/.

Examples:

# installs to $HOME/.local/include/vir
make install prefix=~/.local

# installs to $HOME/src/myproject/3rdparty/vir
make install includedir=~/src/myproject/3rdparty

Usage

#include <vir/simd.h>

namespace stdx = vir::stdx;

using floatv = stdx::native_simd<float>;
// ...

The vir/simd.h header will include <experimental/simd> if it is available, so you don't have to add any buildsystem support. It should just work.

Options

Additional Features

The TS curiously forgot to add simd_cast and static_simd_cast overloads for simd_mask. With vir::stdx::(static_)simd_cast, casts will also work for simd_mask. This does not require any additional includes.

Simple iota simd constants

Requires Concepts (C++20).

#include <vir/simd_iota.h>

constexpr auto a = vir::iota_v<stdx::simd<float>> * 3; // 0, 3, 6, 9, ...

The variable template vir::iota_v<T> can be instantiated with arithmetic types, array types (std::array and C-arrays), and simd types. In all cases, the elements of the variable will be initialized to 0, 1, 2, 3, 4, ..., depending on the number of elements in T. For arithmetic types vir::iota_v<T> is always just 0.

Making simd conversions more convenient

Requires Concepts (C++20).

The TS is way too strict about conversions, requiring verbose std::experimental::static_simd_cast<T>(x) instead of a concise T(x) or static_cast<T>(x). (std::simd in C++26 will fix this.)

vir::cvt(x) provides a tool to make x implicitly convertible into whatever the expression wants in order to be well-formed. This only works, if there is an unambiguous type that is required.

#include <vir/simd_cvt.h>

using floatv = stdx::native_simd<float>;
using intv = stdx::rebind_simd_t<int, floatv>;

void f(intv x) {
  using vir::cvt;
  // the floatv constructor and intv assignment operator clearly determine the
  // destination type:
  x = cvt(10 * sin(floatv(cvt(x))));

  // without vir::cvt, one would have write:
  x = stdx::static_simd_cast<intv>(10 * sin(stdx::static_simd_cast<floatv>(x)));

  // probably don't do this too often:
  auto y = cvt(x); // y is a const-ref to x, but so much more convertible
                   // y is of type cvt<intv>
}

Note that vir::cvt also works for simd_mask and non-simd types. Thus, cvt becomes an important building block for writing "simd-generic" code (i.e. well-formed for T and simd<T>).

Permutations (paper)

Requires Concepts (C++20).

#include <vir/simd_permute.h>

// v = {0, 1, 2, 3} -> {1, 0, 3, 2}
vir::simd_permute(v, vir::simd_permutations::swap_neighbors);

// v = {1, 2, 3, 4} -> {2, 2, 2, 2}
vir::simd_permute(v, [](unsigned) { return 1; });

// v = {1, 2, 3, 4} -> {3, 3, 3, 3}
vir::simd_permute(v, [](unsigned) { return -2; });

The following permutations are pre-defined:

A vir::simd_permute(x, idx_perm) overload, where x is of vectorizable type, is also included, facilitating generic code.

A special permutation vir::simd_shift_in<N>(x, ...) shifts by N elements shifting in elements from additional simd objects passed via the pack. Example:

// v = {1, 2, 3, 4}, w = {5, 6, 7, 8} -> {2, 3, 4, 5}
vir::simd_shift_in<1>(v, w);

SIMD execution policy (P0350)

Requires Concepts (C++20).

Adds an execution policy vir::execution::simd. The execution policy can be used with the algorithms implemented in the vir namespace. These algorithms are additionally overloaded in the std namespace.

At this point, the implementation of the execution policy requires contiguous ranges / iterators.

Usable algorithms

Example

#include <vir/simd_execution.h>

void increment_all(std::vector<float> data) {
  std::for_each(vir::execution::simd, data.begin(), data.end(),
    [](auto& v) {
      v += 1.f;
    });
}

// or

void increment_all(std::vector<float> data) {
  vir::for_each(vir::execution::simd, data,
    [](auto& v) {
      v += 1.f;
    });
}

Execution policy modifiers

The vir::execution::simd execution policy supports a few settings modifying its behavior:

Bitwise operators for floating-point simd

#include <vir/simd_float_ops.h>

using namespace vir::simd_float_ops;

Then the &, |, and ^ binary operators can be used with objects of type simd<floating-point, A>.

Conversion between std::bitset and simd_mask

#include <vir/simd_bitset.h>

vir::stdx::simd_mask<int> k;
std::bitset b = vir::to_bitset(k);
vir::stdx::simd_mask k2 = vir::to_simd_mask<float>;

There are two overloads of vir::to_simd_mask:

to_simd_mask<T, A>(bitset<simd_size_v<T, A>>)

and

to_simd_mask<T, N>(bitset<N>)

vir::simd_resize and vir::simd_size_cast

The header

#include <vir/simd_resize.h>

declares the functions

These functions can resize a given simd or simd_mask object. If the return type requires more elements than the input parameter, the new elements are default-initialized and appended at the end. Both functions do not allow a change of the value_type. However, implicit conversions can happen on parameter passing to simd_size_cast.

vir::simd_bit_cast

The header

#include <vir/simd_bit.h>

declares the function vir::simd_bit_cast<To>(from). This function serves the same purpose as std::bit_cast but additionally works in cases where a simd type is not trivially copyable.

Concepts

Requires Concepts (C++20).

The header

#include <vir/simd_concepts.h>

defines the following concepts:

simdize type transformation

Requires Concepts (C++20).

:warning: consider this interface under :construction:

The header

#include <vir/simdize.h>

defines the following types and constants:

Benchmark support functions

Requires Concepts (C++20) and GNU compatible inline-asm.

The header

#include <vir/simd_benchmarking.h>

defines the following functions:

constexpr_wrapper: function arguments as constant expressions

The header

#include <vir/constexpr_wrapper.h>

defines the following tools:

constexpr_wrapper may appear unrelated to simd. However, it is an important tool used in many places in the implementation and on interfaces of vir-simd tools. vir::constexpr_wrapper is very similar to std::integral_constant, which is used in the simd TS interface for generator constructors.

Example

#include <vir/constexpr_wrapper.h>

auto f(vir::constexpr_value auto N)
{
  std::array<int, N> x = {};
  return x;
}

std::array a = f(vir::cw<4>); // array<int, 4>

using namespace vir::literals;

std::array b = f(10_cw); // array<int, 10>

This example cannot work with a signature constexpr auto f(int n) (or consteval) because n will never be considered a constant expression in the body of the function.

Testing for the version of the vir::stdx::simd (vir-simd) library

The header

#include <vir/simd_version.h>

(which is also included from <vir/simd.h>) defines the type and constant

namespace vir
{
  struct simd_version_t { int major, minor, patchlevel; };

  constexpr simd_version_t simd_version;
}

in addition to the macros VIR_SIMD_VERSION, VIR_SIMD_VERSION_MAJOR, VIR_SIMD_VERSION_MINOR, and VIR_SIMD_VERSION_PATCHLEVEL.

simd_version_t implements all comparison operators, allowing e.g.

static_assert(vir::simd_version >= vir::simd_version_t{0,4,0});

Semantics of version numbers

Debugging

Compile with -D _GLIBCXX_DEBUG_UB to get runtime checks for undefined behavior in the simd implementation(s). Otherwise, -fsanitize=undefined without the macro definition will also find the problems, but without additional error message.

Preconditions in the vir::stdx::simd implementation and extensions are controlled via the -D VIR_CHECK_PRECONDITIONS=N macro, which defaults to 3. Compile-time diagnostics are only possible if the compiler's optimizer can detect the precondition failure. If you get a bogus compile-time failure, you need to introduce the necessary assumption into your calling function, which is typically a missing precondition check in your function.

Option at compile-time at run-time
-DVIR_CHECK_PRECONDITIONS=0 warning invoke UB/unreachable
-DVIR_CHECK_PRECONDITIONS=1 error invoke UB/unreachable
-DVIR_CHECK_PRECONDITIONS=2 warning trap
-DVIR_CHECK_PRECONDITIONS=3 error trap
-DVIR_CHECK_PRECONDITIONS=4 warning print error and abort
-DVIR_CHECK_PRECONDITIONS=5 error print error and abort