starpu_bound_print_lp with STARPU_PERF_MODEL_HOMOGENEOUS_CPU=0 does not work

I used starpu_bound_print_lp to find the theoretical upper bound of execution time in the heterogeneous environment. I'm recording the dependency of the tasks.

I tried to use STARPU_PERF_MODEL_HOMOGENEOUS_CPU=0 with this function, but found that it fails because initialize_arch_duration does not allocate memory correctly for storing duration configurations for each task.

This is caused by that the current StarPU implementation forces topology->nhwdevices[STARPU_CPU_WORKER] to 1 as written in here, but STARPU_PERF_MODEL_HOMOGENEOUS_CPU=0 assumes that each CPU core is treated as different devices.

The current dirty trick to avoid this issue is adding if(!homogeneous && worker_devid) topology->nhwdevices[STARPU_CPU_WORKER] = nworker_per_device; to _starpu_topology_configure_workers but I'm not sure about whether this is the best way to solve the issue.

Is there any suggestion to solve this issue?

starpu-runtime / starpu

starpu_bound_print_lp with STARPU_PERF_MODEL_HOMOGENEOUS_CPU=0 does not work #3