starpu-runtime / starpu

This is a mirror of https://gitlab.inria.fr/starpu/starpu where our development happens, but contributions are welcome here too!
https://starpu.gitlabpages.inria.fr/
GNU Lesser General Public License v2.1
58 stars 13 forks source link

starpu_bound_print_lp with STARPU_PERF_MODEL_HOMOGENEOUS_CPU=0 does not work #3

Closed nindanaoto closed 1 year ago

nindanaoto commented 1 year ago

I used starpu_bound_print_lp to find the theoretical upper bound of execution time in the heterogeneous environment. I'm recording the dependency of the tasks.

I tried to use STARPU_PERF_MODEL_HOMOGENEOUS_CPU=0 with this function, but found that it fails because initialize_arch_duration does not allocate memory correctly for storing duration configurations for each task.

This is caused by that the current StarPU implementation forces topology->nhwdevices[STARPU_CPU_WORKER] to 1 as written in here, but STARPU_PERF_MODEL_HOMOGENEOUS_CPU=0 assumes that each CPU core is treated as different devices.

The current dirty trick to avoid this issue is adding if(!homogeneous && worker_devid) topology->nhwdevices[STARPU_CPU_WORKER] = nworker_per_device; to _starpu_topology_configure_workers but I'm not sure about whether this is the best way to solve the issue.

Is there any suggestion to solve this issue?

sthibaul commented 1 year ago

Mmm, that seems a good way indeed, just moving it a bit, could you try the attached patch? patch.txt