mit-han-lab / torchsparse

[MICRO'23, MLSys'22] TorchSparse: Efficient Training and Inference Framework for Sparse Convolution on GPUs.
https://torchsparse.mit.edu
MIT License
1.16k stars 132 forks source link

[BUG] Running the example fails #218

Closed etienne87 closed 11 months ago

etienne87 commented 1 year ago

Is there an existing issue for this?

Current Behavior

running torchsparse/examples /example.py

gives:

File "torchsparse/nn/functional/conv/kmap/func/hashmap_on_the_fly.pyx", line 33, in torchsparse.nn.functional.conv.kmap.func.hashmap_on_the_fly.build_kmap_implicit_GEMM_hashmap_on_the_fly
IndexError: max(): Expected reduction dim 0 to have non-zero size.

Expected Behavior

i expect the example to run without crashing

Environment

- GCC: gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-18)
- NVCC: command not found
- Cuda Driver: 12.1
- PyTorch:2.0.1+cu117
- PyTorch CUDA: 11.7
- TorchSparse: 2.1.0+torch20cu117
- OS: 
  - REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 8"
  - REDHAT_BUGZILLA_PRODUCT_VERSION=8.8

Anything else?

No response

ys-2020 commented 1 year ago

Hi @etienne87 , thanks for you interest in TorchSparse. Since we haven't seen this error before, would you mind telling us how did you run example.py and which verision of example.py did you use?

etienne87 commented 1 year ago

i copy pasted the example inside examples/example.py:

https://github.com/mit-han-lab/torchsparse/blob/master/examples/example.py

zxhou commented 1 year ago

The same error occurred when I run backbones.py file in the examples folder.

runnning: python examples/backbones.py

error: SparseResNet21D: Traceback (most recent call last): File "examples/backbones.py", line 41, in <module> main() File "/data01/user/miniconda3/envs/torchsparse/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "examples/backbones.py", line 33, in main outputs = model(input) File "/data01/user/miniconda3/envs/torchsparse/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl return forward_call(*input, **kwargs) File "torchsparse/backbones/resnet.pyx", line 53, in torchsparse.backbones.resnet.SparseResNet.forward File "/data01/user/miniconda3/envs/torchsparse/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl return forward_call(*input, **kwargs) File "/data01/user/miniconda3/envs/torchsparse/lib/python3.8/site-packages/torch/nn/modules/container.py", line 204, in forward input = module(input) File "/data01/user/miniconda3/envs/torchsparse/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl return forward_call(*input, **kwargs) File "torchsparse/backbones/modules/blocks.pyx", line 87, in torchsparse.backbones.modules.blocks.SparseResBlock.forward File "/data01/user/miniconda3/envs/torchsparse/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl return forward_call(*input, **kwargs) File "/data01/user/miniconda3/envs/torchsparse/lib/python3.8/site-packages/torch/nn/modules/container.py", line 204, in forward input = module(input) File "/data01/user/miniconda3/envs/torchsparse/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl return forward_call(*input, **kwargs) File "torchsparse/nn/modules/conv.pyx", line 99, in torchsparse.nn.modules.conv.Conv3d.forward File "torchsparse/nn/functional/conv/conv.pyx", line 89, in torchsparse.nn.functional.conv.conv.conv3d File "torchsparse/nn/functional/conv/kmap/build_kmap.pyx", line 83, in torchsparse.nn.functional.conv.kmap.build_kmap.build_kernel_map File "torchsparse/nn/functional/conv/kmap/func/hashmap_on_the_fly.pyx", line 33, in torchsparse.nn.functional.conv.kmap.func.hashmap_on_the_fly.build_kmap_implicit_GEMM_hashmap_on_the_fly IndexError: max(): Expected reduction dim 0 to have non-zero size.

version information: python -c "import torch; print(torch.__version__)" 1.13.0 python -c "import torch; print(torch.version.cuda)" 11.6

ys-2020 commented 1 year ago

Hi @zxhou. Thanks for bringing this problem to our attention! The examples/backbones.py was designed for older versions of TorchSparse. Since we changed the coordinate order in v2.1.0 (See our documents here), some modifications should be made to run backbones.py. Please try the code in this PR.

ys-2020 commented 1 year ago

i copy pasted the example inside examples/example.py:

https://github.com/mit-han-lab/torchsparse/blob/master/examples/example.py

Hi @etienne87. Sorry for the late response. It seems that you are using the correct example.py.

The problem IndexError: max(): Expected reduction dim 0 to have non-zero size. is usually caused by an empty input SparseTensor (there is no point in that tensor). But it shouldn't happen when running example.py (at least that's how it has been in my attempts).

Could you provide more detailed error log? Or maybe you can check if there is an old version of TorchSparse installed in your environment?

chexenia commented 7 months ago

I experience the same behavior while training my custom model after several epochs.

> File "torchsparse/nn/modules/conv.pyx", line 99, in torchsparse.nn.modules.conv.Conv3d.forward
>   File "torchsparse/nn/functional/conv/conv.pyx", line 89, in torchsparse.nn.functional.conv.conv.conv3d
>   File "torchsparse/nn/functional/conv/kmap/build_kmap.pyx", line 83, in torchsparse.nn.functional.conv.kmap.build_kmap.build_kernel_map
>   File "torchsparse/nn/functional/conv/kmap/func/hashmap_on_the_fly.pyx", line 33, in torchsparse.nn.functional.conv.kmap.func.hashmap_on_the_fly.build_kmap_implicit_GEMM_hashmap_on_the_fly
> IndexError: max(): Expected reduction dim 0 to have non-zero size.
> 
python -c "import torchsparse; print(torchsparse.__version__)"
2.1.0+torch113cu117

Any updates on this issue?