mitsuba-renderer / enoki

Enoki: structured vectorization and differentiation on modern processor architectures
Other
1.26k stars 94 forks source link

Masked access behaves differently when compiling with -mavx2 #62

Closed gergol closed 4 years ago

gergol commented 4 years ago

Consider the following code snippet. The program prints [10, 20, 30, 40, 5, 6, 7, 8] four times if it's being compiled "out of the box". However as soon as I specify -mavx2 or march=native or the like the output is [1, 2, 3, 4, 5, 6, 7, 8] for the first three prints. The fourth one works as expected, though.

#include <iostream>
#include <enoki/array.h>
using namespace enoki;
int main() {
  auto print = [](auto x) { std::cout << x << '\n'; };
  using Arr = Array<int, 8>;
  using M = mask_t<Arr>;
  M m{1,1,1,1,0,0,0,0};

  {
    Arr a = {1, 2, 3, 4, 5, 6, 7, 8};
    masked(a, m) *= 10;
    std::cout << a << std::endl;  // <- Wrong: should print [10, 20, 30, 40, 5, 6, 7, 8]
  }
  {
    Arr a = {1, 2, 3, 4, 5, 6, 7, 8};
    a = enoki::select(m, a * 10, a);
    std::cout << a << std::endl; // <- Wrong: should print [10, 20, 30, 40, 5, 6, 7, 8]
  }
  {
    Arr a = {1, 2, 3, 4, 5, 6, 7, 8};
    a[m] *= 10;
    std::cout << a << std::endl; // <- Wrong: should print [10, 20, 30, 40, 5, 6, 7, 8]
  }
  {
    Arr a = {1, 2, 3, 4, 5, 6, 7, 8};
    a[m > 0] *= 10;
    std::cout << a << std::endl; // <- OK: prints [10, 20, 30, 40, 5, 6, 7, 8]
  }
  return 0;
}

I've tested gcc-7.4, clang-7 and clang-9 on ubuntu 18.04.

Here's the CmakeLists.txt I'm using:

cmake_minimum_required(VERSION 3.15)
project(enoki_test)

set(CMAKE_CXX_STANDARD 17)

add_executable(enoki_test main.cpp)

target_include_directories(enoki_test PRIVATE ../enoki/include)

set(CMAKE_CXX_FLAGS "-mavx2")

Any idea how to fix this?

gergol commented 4 years ago

It turns out that it works with using M = mask_t<Array<bool, 8>>;. Maybe I do not fully understand the proper use of masks (unfortunately I have no experience with SIMD yet), so if this is not a bug, it could be helpful to add a short section in the docs how to properly create masks manually.

wjakob commented 4 years ago

Hi Gergol,

the problem is the way in which you manually create a mask, which is basically an implementation detail of the underlying implementation. A mask is a 1-bit value per lane on some targets (AVX512 K mask registers), a 1 byte boolean on other targets, and a full 32 bit value on AVX/AVX2. So for this specific target, you would have to use 0xffffffff instead of 1. Arguably the constructor for masks could be implemented in a nicer way so that it unifies these cases. Then, on the other hand, creating a mask by hand in this way is probably a fairly obscure use case.

Best, Wenzel

wjakob commented 4 years ago

(Normally masks are the result of some actual computation, in which case they will naturally have the right representation.)

gergol commented 4 years ago

Hi Wenzel!

Thanks for the quick reply! The confusing thing is that all masks look identical (containing 1s and 0s) when printed, even if they show different behaviour. Looking at them with the debugger revealed what you wrote in your first post. My use case was to try to write into the odd/even values of a packet to calculate the magnitudes of the interleaved complex STFT result from my other issue.