rapidsai / wholegraph

WholeGraph - large scale Graph Neural Networks
https://docs.rapids.ai/api/cugraph/stable/wholegraph/
Apache License 2.0
100 stars 38 forks source link

Add gather/scatter support 1D tensor #229

Closed chang-l closed 1 day ago

chang-l commented 1 month ago

This PR is to add gather/scatter support 1D tensor on python level, as WholeGraph should support basic indexing operations for both 1D (array) and 2D (matrix) wholememory tensors. Without this PR, if with 1D wholememory tensor, gather/scatter op does not work, e.g., https://github.com/rapidsai/wholegraph/blob/0efba33835d6e4e104b5d7101a91e0ea55a6ca53/python/pylibwholegraph/pylibwholegraph/torch/tensor.py#L89

To test, run

pytest --cache-clear  --import-mode=append  tests/wholegraph_torch/ops/test_wholegraph_gather_scatter.py -s

Remaining issue:

On my local test with single GPU, the test can pass.
For multiGPU setup, gather op works fine, but 1D scatter seems not working as it would crash at: https://github.com/rapidsai/wholegraph/blob/2e963b98aa6027c300d60e839010d3dd8ca422eb/python/pylibwholegraph/pylibwholegraph/tests/wholegraph_torch/ops/test_wholegraph_gather_scatter.py#L108 with incorrect scatter outputs: Indices where allclose fails: tensor([0., 0., 0., ..., 0., 0., 0.]) tensor([ 1435., 1439., 1443., ..., 257703., 257707., 257711.])

@linhu-nv Can you please take a look? Does scatter_op suppose to work with 1D wholememory tensor?

copy-pr-bot[bot] commented 1 month ago

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

linhu-nv commented 1 month ago

Thanks for bring this up. Yes, scatter_op supposes to work with 1D wholememory tensor. I will try to find out why it does't work.

BradReesWork commented 2 days ago

/okay to test

BradReesWork commented 2 days ago

/okay to test

BradReesWork commented 1 day ago

/merge