Open anders0821 opened 5 years ago
Yeah, there is indeed a function to adjust the chunk_size
. It's called rechunk
. I notice that this function is also absent in our docs, sorry about that, I will add it later.
Now just refer to the function defination:
Just call tensor.rechunk(new_chunk_size)
. Welcome to tell me the new result once you have new feedback.
The chunk_size of the original data can be set. And the built-in function automatically determines the output chunk size by its input chunking. This will lead to some problem. I implemented the kron function which is absent in mars:
It is called by:
The computing graph is:
Furthermore, I have tried a simpler test code for the inner function mt.repeat():
The computing graph is:
The mt.repeat() greatly increases the number of chunks in the computing graph. I think it is a waste of resources to use 100 chunks of 10*10 in the following computation. The number of chunks is related to the dimension of the input data in my kroon function, even when I have not set chunk_size to any data. It makes the distributed cluster scheduling heavy. Can the implicit chunks after functions like mt.repeat() be merged or rechunked whenever I like?