Op to generate each row of a matrix independently

The embedding algorithms (e.g. GraphWave and GraphSAGE) would benefit from a GraphBLAS dialect op as described below:

It takes a block with 1 arg.
The arg is the row or column index (depending on if the output is a CSR or CSC matrix).
The block will have a graphblas.yield statement that yields a vector.
The result of this op will be the concatenation of all of these yielded vectors.

This op will be inherently parallel. We can run each op and store vectors' pointers into an array of pointers. We will then know the number of non-zero elements a head of time and can do exactly one resize operation.

This op may be able to let us fuse loops like this:

for i in range(num_nodes):
    v = gen_vec()
    vecs.append(v)

for i in range(num_nodes):
    v = vecs[i]
    yield v * M

into

for i in range(num_nodes):
    v = gen_vec()
    yield v * M

One consideration is that we don't know the number of non-zero elements ahead of time. This might cause us to have to resize the output matrix num_rows times. We could mitigate this issue by doing what array lists do when they hit their max size (i.e. we'd only resize/double the current NNZ every time we hit the current limit and do one final resize at the end when we know true NNZ). This would only matter in the case where we'd generate each row sequentially.

It's unclear if this idea is better or worse than just having a for-loop + an op that adds a new row to the current matrix. https://github.com/metagraph-dev/mlir-graphblas/issues/250 may or may not come into play here.

This is really just a special-case optimization of reduce(map(f, vecs), stack_tensors).

metagraph-dev / mlir-graphblas

Op to generate each row of a matrix independently #251