I used starpu_bound_print_lp to find the theoretical upper bound of execution time in the heterogeneous environment. I'm recording the dependency of the tasks.
I tried to use STARPU_PERF_MODEL_HOMOGENEOUS_CPU=0 with this function, but found that it fails because initialize_arch_duration does not allocate memory correctly for storing duration configurations for each task.
This is caused by that the current StarPU implementation forces topology->nhwdevices[STARPU_CPU_WORKER] to 1 as written in here, but STARPU_PERF_MODEL_HOMOGENEOUS_CPU=0 assumes that each CPU core is treated as different devices.
The current dirty trick to avoid this issue is adding if(!homogeneous && worker_devid) topology->nhwdevices[STARPU_CPU_WORKER] = nworker_per_device; to _starpu_topology_configure_workers but I'm not sure about whether this is the best way to solve the issue.
I used
starpu_bound_print_lp
to find the theoretical upper bound of execution time in the heterogeneous environment. I'm recording the dependency of the tasks.I tried to use
STARPU_PERF_MODEL_HOMOGENEOUS_CPU=0
with this function, but found that it fails becauseinitialize_arch_duration
does not allocate memory correctly for storing duration configurations for each task.This is caused by that the current StarPU implementation forces
topology->nhwdevices[STARPU_CPU_WORKER]
to 1 as written in here, butSTARPU_PERF_MODEL_HOMOGENEOUS_CPU=0
assumes that each CPU core is treated as different devices.The current dirty trick to avoid this issue is adding
if(!homogeneous && worker_devid) topology->nhwdevices[STARPU_CPU_WORKER] = nworker_per_device;
to_starpu_topology_configure_workers
but I'm not sure about whether this is the best way to solve the issue.Is there any suggestion to solve this issue?