What about cross-server data transmission overhead?

hliangzhao commented 3 years ago

Sorry to bother you again.

In my research area, each stage is scheduled to be placed on some VM node. If its child stages are placed on different VM nodes, cross-node data transmission overhead should be considered. Thus, minimize the makespan can be divided into two subgoals, the execution time and the cross-node communication overhead.

But I found that Decima does not consider the transmission time of intermediate data between the fore-and-aft stages of each job. Is this because the scheduling environment is Spark? Or all the jobs are running on the same "VM node"?

hongzimao commented 3 years ago

I agree that data locality is an important aspect to optimize. Our simulator didn't capture it explicitly because the particular workload we run on Spark did not show much difference (all VMs are in a single datacenter, where the large network throughput makes this locality issue minimum).

However, I would say it shouldn't be hard to add the transmission time in the simulator. You can create a multiplier on the task run time based on parent and child node.

Also, for RL, you might want to still optimize directly for the end-objective as opposed to divide the goal into sub-goals and optimize them individually. It might be difficult to hand-tune the balance between execution time and cross-node communication overhead.

Hope these help!

hliangzhao commented 3 years ago

Thanks! This helps a lot!

hongzimao / decima-sim

What about cross-server data transmission overhead? #29