Can the implicit chunks after some built-in function (e.g. mt.repeat()) be merged for fast distributed computing?

The chunk_size of the original data can be set. And the built-in function automatically determines the output chunk size by its input chunking. This will lead to some problem. I implemented the kron function which is absent in mars:

def my_mt_kron(A, B):
    # adjust A.ndim == B.ndim
    while A.ndim < B.ndim:
        A = mt.expand_dims(A, 0)
    while B.ndim < A.ndim:
        B = mt.expand_dims(B, 0)

    # repeat A
    A_rep = A
    for i in range(B.ndim):
        A_rep = mt.repeat(A_rep, B.shape[i], i)

    # tile B
    B_tile = mt.tile(B, A.shape)

    return A_rep * B_tile

It is called by:

A = np.random.randn(10, 10)
B = np.random.randn(10, 10)
KAB = my_mt_kron(mt.array(A), mt.array(B))
KAB.visualize(tiled=True).view()

The computing graph is:

Furthermore, I have tried a simpler test code for the inner function mt.repeat():

A = mt.array(np.random.randn(10, 10))
A_rep = mt.repeat(mt.repeat(A, 10, 0), 10, 1)
A_rep.visualize(tiled=True).view()

The computing graph is:

The mt.repeat() greatly increases the number of chunks in the computing graph. I think it is a waste of resources to use 100 chunks of 10*10 in the following computation. The number of chunks is related to the dimension of the input data in my kroon function, even when I have not set chunk_size to any data. It makes the distributed cluster scheduling heavy. Can the implicit chunks after functions like mt.repeat() be merged or rechunked whenever I like?

mars-project / mars

Can the implicit chunks after some built-in function (e.g. mt.repeat()) be merged for fast distributed computing? #461