Closed WwwwwYyyy closed 1 year ago
Is that related to issue #14 ? If yes, please close this new issue and copy your comments in the previous issue.
And also, have you read https://files.inria.fr/starpu/doc/html/OfflinePerformanceTools.html#Off-linePerformanceFeedback to find out how to analyze the performance of a program ?
Hello Professor: Here are the commands that we used: export OMP_NUM_THREADS=1 export OPENBLAS_NUM_THREADS=1 export STARPU_NCPU=32 (The first three commands are to lower the number of cores for each process of MPI) export STARPU_WORKERS_GETBIND=1 mpirun –bind-to socket -n 2 ./plu_example_double 8 -size 16384 -nblocks 16 -p 2 -q 1
However, it seems that the scalability still didn’t enhance neither laptap nor on a dual-node machine. The program that run with 2 or 4 process is still slower than 1 process. Is there anything wrong with the commands?Looking forward to your reply