shibatch / sleef

SIMD Library for Evaluating Elementary Functions, vectorized libm and DFT
https://sleef.org
Boost Software License 1.0
628 stars 126 forks source link

Feature detection is not thread-safe #560

Open Andreas-Krebbel opened 1 month ago

Andreas-Krebbel commented 1 month ago

On S/390 I ran into problem with the way the VXE2 hardware feature is currently being detected in multi-threaded code. It uses a SIGILL signal handler and sigjmp/longjmp in order to detect whether a certain instruction is available or not. Although I've looked into it for S/390 this should apply to other platforms using the same mechanism.

The problem is triggered by the way PyTorch is using Sleef: https://github.com/pytorch/pytorch/issues/128503

But it can also be reproduced with a small example like this:

#include <stdio.h>
#include "sleef.h"

#ifndef NUM_THREADS
#define NUM_THREADS 4
#endif

int main() {
  __vector float out[NUM_THREADS];

#pragma omp parallel for num_threads(NUM_THREADS)
  for (int i = 0; i < NUM_THREADS; i++)
    out[i] = Sleef_expf4_u10 ((__vector float){1.0f + i, 2.0f + i , 3.0f + i, 4.0f + i});

  for (int i = 0; i < NUM_THREADS;i++)
    {
      for (int j = 0; j < 4; j++)
    printf("%6.3f ", out[i][j]);
      printf("\n");
    }
  printf("\n");
}

Running the test like this: gcc -DNUM_THREADS=4 t.c -O3 -mzvector -march=z15 -lsleef -fopenmp -lgomp -o t && ./t results in either broken results or crashes

While the single threaded version works fine: gcc -DNUM_THREADS=1 t.c -O3 -mzvector -march=z15 -lsleef -fopenmp -lgomp -o t && ./t

The cpuSupportsExt function uses the file scope variable sigjmp to store the execution status what makes this function thread-unsafe.

I will send a PR to check HWCAPs instead of using the signal handler. This fixes the problem for S/390. I think other platforms might need similar adjustments.

blapie commented 1 month ago

Sorry for the late reply and thank you very much for the detailed explanation of the issue, providing a reproducer as well as suggesting a fix and PR. Will look into your PR shortly!

blapie commented 3 weeks ago

Re-opening the issue, as #560 only fixed the issue on a single architecture (s/390) and a single OS (linux).

563 attempts to fix other OS-es

Yet to implement a thread-safe detection for other architecture.