kokkos / kokkos

Kokkos C++ Performance Portability Programming Ecosystem: The Programming Model - Parallel Execution and Memory Abstraction
https://kokkos.org
Other
1.98k stars 435 forks source link

Default thread mapping behavior between P and Q threads. #91

Closed kyungjoo-kim closed 9 years ago

kyungjoo-kim commented 9 years ago

The current default Kokkos initialization for P and Q threads just take the number of threads.

Kokkos::ExecSpace::initialize( nthreads );

One problem here is that P / Q thread execution space has different default thread mapping strategy for the same device. For instance, 2 numa x 8 core x 1 thread per core

Kokkos::Threads::initialize( 2 ); // this returns default team size = 1 Kokkos::Qthread::initialize( 2 ); // this returns default team size = 2 as a single shepherd is used per numa

This needs to initialize the same each other for the same device.

crtrott commented 9 years ago

The difference it whether or not Kokkos can discover the topology. I am pretty sure that Pthreads will give back a team size of 2 if you compile with HWLOC. Without HWLOC we don't know if you have hyper threads or not, except for QThreads which I assume has its own discovering mechanism.

kyungjoo-kim commented 9 years ago

I compiled with hwloc and it detecs threads[2] threads_per_numa[1] threads_per_core[1] when I submit 2 threads. So it is right to find out team size = 1 in the default constructor of TaskPolicy. As the Qthread has its own mechanism, we need a way to force the same thread mapping in the Kokkos level.

crtrott commented 9 years ago

Why do we need to force the same behavior? Also your hwloc numbers can't be quiet right are you running with MPI?

kyungjoo-kim commented 9 years ago

This is not a bug fix request but a feature request. We have a device with different tasking backends. An algorithm is written using Kokkos portable (unified) interface (task policy, league, team). Then, it is natural to expect that those backends default behavior and the thread topology structured by the Kokkos interface should be similar. Otherwise, an application developer needs to consider a variant of the algorithm that is specialized with certain tasking backends even if the code is designed for the same device.

crtrott commented 9 years ago

If you want a specific behavior independent of the back-end you need to request that, and not use defaults. I see that you opened another issue #92 which points at QThreads not having that initializer. That is the missing feature, not that Pthreads and Qthreads behave differently with default settings. I will close this issue and we resolve your problem by adding that initializer to Qthreads via issue #92 .

nmhamster commented 9 years ago

Qthreads also relies on HWLOC.

We do have some potential other options here. One is to query the process mask, another is to query the environment like Qthreads does. I think we want to "do the right thing" here - i.e. Go parallel if we can?

S

Si Hammond Scalable Computer Architectures Sandia National Laboratories, NM [Sent remotely, please excuse typing errors]


From: Christian Trott notifications@github.com Sent: Monday, September 21, 2015 11:47:49 AM To: kokkos/kokkos Subject: [EXTERNAL] Re: [kokkos] Default thread mapping behavior between P and Q threads. (#91)

The difference it whether or not Kokkos can discover the topology. I am pretty sure that Pthreads will give back a team size of 2 if you compile with HWLOC. Without HWLOC we don't know if you have hyper threads or not, except for QThreads which I assume has its own discovering mechanism.

Reply to this email directly or view it on GitHubhttps://github.com/kokkos/kokkos/issues/91#issuecomment-142057710.