I am trying to improve parallel performance of NNTile on a server with several GPUs. Unfortunately, STARPU_REDUX access mode leads to much worse performance, compared to STARPU_RW|STARPU_COMMUTE access mode, but output result is nearly the same. Today I tried to take advantage of StarPU data arbiters. And I got several different errors when running my application. I made a separate arbiter for each matrix -- hope this is how it shall be used.
Steps to reproduce
I am using StarPU of starpu-1.3 tag of GitLab repo (commit 1ace9c2ac6dccca341d4c4ce08f924581318c808). I enabled arbiter for every matrix in my application as I found it in a corresponding example tests/datawizard/test_arbiter.cpp. When I run my application on a server with GPUs I get different errors. For example:
Preliminary story
I am trying to improve parallel performance of NNTile on a server with several GPUs. Unfortunately,
STARPU_REDUX
access mode leads to much worse performance, compared toSTARPU_RW|STARPU_COMMUTE
access mode, but output result is nearly the same. Today I tried to take advantage of StarPU data arbiters. And I got several different errors when running my application. I made a separate arbiter for each matrix -- hope this is how it shall be used.Steps to reproduce
I am using StarPU of starpu-1.3 tag of GitLab repo (commit 1ace9c2ac6dccca341d4c4ce08f924581318c808). I enabled arbiter for every matrix in my application as I found it in a corresponding example tests/datawizard/test_arbiter.cpp. When I run my application on a server with GPUs I get different errors. For example:
with corresponding backtrace + config.log
or other error:
with corresponding backtrace config.log is the same as above.
CUDA version is 12.2.