CUDA error: unknown error

chJunger commented 2 years ago

I used AANet on my local PC. It works fine - thanks for the great work!

Now I would like to create my own model. Because my GPU is not that powerful, I would like to use the training on our GPU cluster. This causes me the following problem.

When executing the aanet+_predict.sh script, I get the following error message: --> Do you know how I can fix the CUDA error? Many thanks in advance.

Error message:

Traceback (most recent call last):
  File "predict.py", line 203, in <module>
    main()
  File "predict.py", line 106, in main
    deformable_groups=args.deformable_groups).to(device)
  File "/usr/scratch4/user-name/anaconda3/envs/aanet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 432, in to
    return self._apply(convert)
  File "/usr/scratch4/user-name/anaconda3/envs/aanet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 208, in _apply
    module._apply(fn)
  File "/usr/scratch4/user-name/anaconda3/envs/aanet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 208, in _apply
    module._apply(fn)
  File "/usr/scratch4/user-name/anaconda3/envs/aanet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 208, in _apply
    module._apply(fn)
  [Previous line repeated 1 more time]
  File "/usr/scratch4/user-name/anaconda3/envs/aanet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 230, in _apply
    param_applied = fn(param)
  File "/usr/scratch4/user-name/anaconda3/envs/aanet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 430, in convert
    return t.to(device, dtype if t.is_floating_point() else None, non_blocking)
RuntimeError: CUDA error: unknown error

Installed dependencies:

gcc v5.5.0 (also tested with v9.3.0)
CUDA v11.2 (also 10.1 testes)
NVIDIA-SMI 460.32.03
Driver Version: 460.32.03
Cuda compilation tools, release 10.1, V10.1.243
python v3.7
building deformable convolution works fine
using aanet conda environment

haofeixu commented 1 year ago

Hi @ipfJC , sorry for the late response.

If this issue is still relavant to you, I would suggest to try our new GMStereo model: https://haofeixu.github.io/unimatch/ & https://github.com/autonomousvision/unimatch. No CUDA op is required. A Colab demo is also provided to try our model in your browser. Hope it helps, thanks.

chJunger commented 1 year ago

Hi @haofeixu , thanks for your answer.

Thanks for the hints. I will have a look at the new GMStereo model.

haofeixu commented 1 year ago

@ipfJC A HuggingFace demo is also available to try our model: https://huggingface.co/spaces/haofeixu/unimatch

chJunger commented 1 year ago

A HuggingFace demo is also available to try our model

Wow, I just tested it - great work!

haofeixu / aanet

CUDA error: unknown error #77