Scalability of StarPU-MPI for LU decomposition

starpu-runtime / starpu

This is a mirror of https://gitlab.inria.fr/starpu/starpu where our development happens, but contributions are welcome here too!

GNU Lesser General Public License v2.1

63 stars 12 forks source link

Hello Professor: Here are the commands that we used: export OMP_NUM_THREADS=1 export OPENBLAS_NUM_THREADS=1 export STARPU_NCPU=32 (The first three commands are to lower the number of cores for each process of MPI) export STARPU_WORKERS_GETBIND=1 mpirun –bind-to socket -n 2 ./plu_example_double 8 -size 16384 -nblocks 16 -p 2 -q 1

However, it seems that the scalability still didn’t enhance neither laptap nor on a dual-node machine. The program that run with 2 or 4 process is still slower than 1 process. Is there anything wrong with the commands?Looking forward to your reply

starpu-runtime / starpu

Scalability of StarPU-MPI for LU decomposition #19