metagraph-dev / mlir-graphblas

MLIR tools and dialect for GraphBLAS
https://mlir-graphblas.readthedocs.io/en/latest/
Apache License 2.0
15 stars 6 forks source link

Op to generate each row of a matrix independently #251

Open paul-tqh-nguyen opened 2 years ago

paul-tqh-nguyen commented 2 years ago

The embedding algorithms (e.g. GraphWave and GraphSAGE) would benefit from a GraphBLAS dialect op as described below:

This op will be inherently parallel. We can run each op and store vectors' pointers into an array of pointers. We will then know the number of non-zero elements a head of time and can do exactly one resize operation.

This op may be able to let us fuse loops like this:

for i in range(num_nodes):
    v = gen_vec()
    vecs.append(v)

for i in range(num_nodes):
    v = vecs[i]
    yield v * M

into

for i in range(num_nodes):
    v = gen_vec()
    yield v * M

One consideration is that we don't know the number of non-zero elements ahead of time. This might cause us to have to resize the output matrix num_rows times. We could mitigate this issue by doing what array lists do when they hit their max size (i.e. we'd only resize/double the current NNZ every time we hit the current limit and do one final resize at the end when we know true NNZ). This would only matter in the case where we'd generate each row sequentially.

It's unclear if this idea is better or worse than just having a for-loop + an op that adds a new row to the current matrix. https://github.com/metagraph-dev/mlir-graphblas/issues/250 may or may not come into play here.

This is really just a special-case optimization of reduce(map(f, vecs), stack_tensors).