Open zhongchun opened 1 year ago
In gen_subtask_graph, Mars always create new out chunks even if the out chunk already exists. It costs a lot of time if there are plenty of chunks.
gen_subtask_graph
Fixes #3341
I did a comparison, in which one creates new out chunks and the other does not. The test scripts are:
import mars.tensor as mt import mars.dataframe as md size = 50000 da1 = mt.random.random((size, 2), chunk_size=(1, 2)) df1 = md.DataFrame(da1, columns=list("AB")) df2 = df1 + 10 df3 = df2.sum() ret = df3.execute()
Cost time of Subtask generation are: 122.92s, 56.63s.
Subtask
What do these changes do?
In
gen_subtask_graph
, Mars always create new out chunks even if the out chunk already exists. It costs a lot of time if there are plenty of chunks.Related issue number
Fixes #3341
I did a comparison, in which one creates new out chunks and the other does not. The test scripts are:
Cost time of
Subtask
generation are: 122.92s, 56.63s.Check code requirements