rusty1s / pytorch_sparse

PyTorch Extension Library of Optimized Autograd Sparse Matrix Operations
MIT License
1.01k stars 147 forks source link

Aborted (core dumped) #336

Closed Dorbmon closed 1 year ago

Dorbmon commented 1 year ago

When I called:SparseTensor.from_edge_index(edge_index, edge_attr, (N, N), is_sorted=False), It crashed:

Current thread 0x00007f3bc0af9740 (most recent call first):
  File "~/.local/lib/python3.9/site-packages/torch/_ops.py", line 502 in __call__
  File "~/.local/lib/python3.9/site-packages/pyg_lib/ops/__init__.py", line 257 in index_sort
  File "~/.local/lib/python3.9/site-packages/torch_sparse/utils.py", line 21 in index_sort
  File "~/.local/lib/python3.9/site-packages/torch_sparse/storage.py", line 156 in __init__
  File "~/.local/lib/python3.9/site-packages/torch_sparse/tensor.py", line 26 in __init__
  File "~/.local/lib/python3.9/site-packages/torch_sparse/tensor.py", line 68 in from_edge_index
  File "/project/prodigy/./experiments/sampler.py", line 23 in preprocess
  File "/project/prodigy/./experiments/sampler.py", line 175 in __init__
  File "/project/prodigy/./data/mag240m.py", line 48 in get_mag240m_dataset
  File "/project/prodigy/./data/data_loader_wrapper.py", line 24 in get_dataset_wrap
  File "/project/prodigy/experiments/run_single_experiment.py", line 44 in <module>
Aborted (core dumped)

Here are the argument sizes:

edge_index shape: torch.Size([2, 2595497852])
edge_attr shape: torch.Size([2595497852]),
N: 121751666

When I reduce the size of edge_index, it works. However, my memory is quit enough. When crash happened, there was still 80G memory available. Anyway to solve this? Thanks!

rusty1s commented 1 year ago

I think torch.sort will crash here since it OOMs. Not sure how we can resolve this :(

Dorbmon commented 1 year ago

@rusty1s 🤔️memory is still quite enough, why OOM?

rusty1s commented 1 year ago

How much memory does your machine have? Storing the edge index alone takes about 40GB, and then there is additional memory needed to sort this, which should bring this to 100-150GB peak.

Dorbmon commented 1 year ago

I have 256G memory and it uses about 170G when it crashed....

rusty1s commented 1 year ago

What happens if you run:

idx = (121751666 * edge_index[0]).add_(edge_index[1])
perm = torch.argsort(idx)
edge_index = edge_index[:, perm]

without the usage of torch-sparse. Does it still crash?

Dorbmon commented 1 year ago

It works fine.

Dorbmon commented 1 year ago

It doesn't crash on Python 3.10. I used to run it on Python 3.9. It seems a python memory allocation bug?

rusty1s commented 1 year ago

That's strange, but nice find. I am not yet sure how to fix this, so if updating to Python 3.10 works for you, we can close this issue.

Dorbmon commented 1 year ago

Thanks.