In gen_subtask_graph, Mars always create new out chunks even if the out chunk already exists. It costs a lot of time if there are plenty of chunks. Actually, there is no need to create out chunks except the FetchChunk.
I did a comparison, in which one creates new out chunks and the other does not. The result shows that the cost is reduced from 122.92s to 56.63s with about 53.93% reduction.
So we should optimize the SubtaskGraph generation and make it more efficient.
Problem
In
gen_subtask_graph
, Mars always create new out chunks even if the out chunk already exists. It costs a lot of time if there are plenty of chunks. Actually, there is no need to create out chunks except theFetchChunk
. I did a comparison, in which one creates new out chunks and the other does not. The result shows that the cost is reduced from 122.92s to 56.63s with about 53.93% reduction.So we should optimize the
SubtaskGraph
generation and make it more efficient.