parallel-runtimes / lomp

Little OpenMP Library
Apache License 2.0
153 stars 17 forks source link

Better Support for Thread Affinity #37

Open mjklemm opened 3 years ago

mjklemm commented 3 years ago

Is your feature request related to a problem? Please describe. At the moment, LOMP does a very limited thread pinning. It enumerates the cores to pin threads to by starting at core 0 and then assigning threads to core with increasing core ID. This is wrong when only a subset of the cores is available to the process, e.g. via the taskset command or via MPI process affinity.

Describe the solution you'd like LOMP should determine the process affinity mask at startup of the library and then only use the available cores for pinning threads.

Describe alternatives you've considered Another option would be to implement a subset of the OpenMP OMP_PLACES and OMP_PROC_BIND clauses.

Additional context Here's an example of what LOMP does wrong:

orcus ~/proj*/lo*/build-x86_64 [0:0]> LOMP_DEBUG=10 OMP_NUM_THREADS=2 taskset -c 6,7 examples/hello_world
Before parallel region
=======================================
LOMP:runtime version 0.1 (SO version 1) compiled at 08:03:10 on Aug 22 2021
LOMP:from Git commit e01700a for x86_64 by LLVM:12:0:0
LOMP:with configuration -mrtm;-mcx16;DEBUG=10;LOMP_GNU_SUPPORT=1;LOMP_WARN_API_STUBS=1;LOMP_WARN_ARCH_FEATURES=1;LOMP_HAVE_RTM=1;LOMP_HAVE_CMPXCHG16B=1;LOMP_HAVE_LIBNUMA=1
LOMP:NUMA: Initializing NUMA support.
LOMP:NUMA: Found 8 cores in 1 NUMA domain.
LOMP:NUMA: NUMA Domain 0:
LOMP:NUMA:    0 1 2 3 4 5 6 7
LOMP:Using barrier FT16FlagLBW4 [FixedTree(16)Flag;LBW4 broadcast]
LOMP:Thread 0 tightly affinitized to logicalCPU 0
LOMP:NUMA: Thread 0x987400 (thread ID: 0) on core 0, domain 0
LOMP:Thread 1 tightly affinitized to logicalCPU 1
LOMP:NUMA: Thread 0x7f9a34000900 (thread ID: 1) on core 1, domain 0
LOMP:Using 2 threads
Hello World: I am thread 0, and my secrets are 42.000000 and 21
Hello World: I am thread 1, and my secrets are 42.000000 and 21
=======================================
After parallel region