xtensor-stack / xtensor

C++ tensors with broadcasting and lazy computing
BSD 3-Clause "New" or "Revised" License
3.36k stars 398 forks source link

CPU affinity is overridden when calling xtensor code #2458

Open pylaterreur opened 3 years ago

pylaterreur commented 3 years ago

Hi,

tl;dr: affinity goes from a mask 4 (that I've set myself) to a mask 20000 (that xtensor/mkl set down the line). I believe it is not a desirable behavior :) .

This is on GNU/Linux. Running inside a thread that has been pinned to a CPU with a taskset mask = 4. It then runs into some xtensor code (stack trace below, printed on a gdb catch syscall sched_setaffinity). A few of these later, my thread has an affinity mask of 20000.

#0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:39
#1  [...] in mkl_serv_get_ncorespercpu ()
#2  [...] in mkl_lapack_ilaenv ()
#3  [...] in mkl_lapack_dgeev ()
#4  [...] in mkl_lapack.dgeev_ ()
#5  [...] in cxxlapack::geev<int> ([...])
    at [...]/include/xflens/cxxlapack/interface/geev.tcc:101
#6  [...] in xt::lapack::geev<xt::xarray_container<xt::uvector<double, std::allocator<double> >, (xt::layout_type)2, xt::svector<unsigned long, 4ul, std::allocator<unsigned long>, true>, xt::xtensor_expression_tag>, xt::xtensor_container<xt::uvector<double, std::allocator<double> >, 1ul, (xt::layout_type)2, xt::xtensor_expression_tag>, xt::xtensor_container<xt::uvector<double, std::allocator<double> >, 2ul, (xt::layout_type)2, xt::xtensor_expression_tag> > ([...]) at [...]/include/xtensor-blas/xlapack.hpp:538
#7  [...] in xt::linalg::eig<xt::xarray_container<xt::uvector<double, std::allocator<double> >, (xt::layout_type)1, xt::svector<unsigned long, 4ul, std::allocator<unsigned long>, true>, xt::xtensor_expression_tag>, (void*)0> (A=...)
    at [...]/include/xtensor-blas/xlinalg.hpp:316
[...]

Is there a way to tell xtensor to ensure MKL / (or other backend?) to not change cpu affinities? If not, in my current setup, what can I do to leave the affinity as it is?

Cheers!

JohanMabille commented 2 years ago

Hi,

According to the reply in this thread, setting the environment variable MKL_NUM_THREADS should prevent the MKL to overwrite the thread affinity mask. Could you confirm that it works for you?

pylaterreur commented 2 years ago

Hi, I've tried that, it didn't work.

LD_PRELOAD to intercept calls to getenv() led to have these vars read:

getenv:MKL_CBWR=nullptr
getenv:MKL_DEBUG_CPU_TYPE=nullptr
getenv:MKL_ENABLE_INSTRUCTIONS=nullptr
getenv:MKL_DISABLE_FAST_MM=nullptr
getenv:MKL_FAST_MEMORY_LIMIT=nullptr
getenv:MKL_NUM_THREADS=nullptr
getenv:MKL_NUM_STRIPES=nullptr
getenv:MKL_DOMAIN_NUM_THREADS=nullptr
getenv:MKL_DYNAMIC=nullptr
getenv:OMP_NUM_THREADS=nullptr
getenv:MKL_MPI_PPN=nullptr
getenv:I_MPI_NUMBER_OF_MPI_PROCESSES_PER_NODE=nullptr
getenv:I_MPI_PIN_MAPPING=nullptr
getenv:OMPI_COMM_WORLD_LOCAL_SIZE=nullptr
getenv:MPI_LOCALNRANKS=nullptr
getenv:I_MPI_THREAD_LEVEL=nullptr

I've tried a few variations of setting these env vars but no luck.