PWC on RTX 3090: pytorch CUDNN_STATUS_MAPPING_ERROR and CUDA error: invalid texture reference

v-iashin / video_features

Extract video features from raw videos using multiple GPUs. We support RAFT flow frames as well as S3D, I3D, R(2+1)D, VGGish, CLIP, and TIMM models.

https://v-iashin.github.io/video_features

MIT License

508 stars 96 forks source link

PWC on RTX 3090: pytorch CUDNN_STATUS_MAPPING_ERROR and CUDA error: invalid texture reference #13

Closed leftthomas closed 8 months ago

leftthomas commented 2 years ago

I have installed your library and use PWC conda env, but it will throw pytorch CUDNN_STATUS_MAPPING_ERROR and CUDA error: invalid texture reference errors, I run this code on RTX 3090 GPU.

v-iashin commented 2 years ago

Ok, I could not replicate the same exact problem but I couldn't use any model with this GPU either.

In particular, when I run the MWE for r(2+1)d it complains about CUDA-GPU mismatch:

NVIDIA GeForce RTX 3090 with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_61 sm_70 sm_75 compute_37

When I am using pwc it just hangs on the default environment.

I am sorry about this. I am afraid, I don't have much time to fix it at the moment.

The problem is with pytorch-cuda versions for sure but upgrading the environment accordingly leads to version conflicts. I would appreciate it if you could look into this if you have an appetite.

v-iashin commented 2 years ago

A bit more info:

the torch_zoo env can be upgraded to PyTorch 1.7.1 which will support everything except for PWC. I also noticed that ResNet has a bit of difference in output features but considering this insignificant because with show_pred=true the output classes are still reasonable. I will create a fix momentarily. With higher versions, it complains about glibc version mismatch.
```
conda install pytorch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2 cudatoolkit=11.0 -c pytorch
```
However, when using the same command for pwc, it complains about the same version mismatch and does not allow to install even 1.7.1. Here I need some more time to inspect it.

@leftthomas Do you need specifically PWC? Would you like to replicate BMT? If not, I have RAFT implemented which is more accurate in calculating optical flow and can be used from torch_zoo environment.

leftthomas commented 2 years ago

I have other GPUs, like RTX 2060 Super or TiTAN X, so I could run that in these GPUs now.

Wooho-Moon commented 2 years ago

I slove the issue. My environment is RTX 3090, cuda11.2, cudnn 8.2 . First, I compile the origin caffe using cmake. if you wanna compile origin caffe, you need to change the code convlayer.cpp and deconvlayer.cpp. And then clone the git, origin flownet 2 and add layers ( if you need )

v-iashin commented 2 years ago

Hi @Wooho-Moon thanks for sharing

Lonicer commented 1 year ago

I also encountered the same error, can you tell me some tutorials on your solution? I don't understand the Cmake compilation you mentioned; thank you!!!

v-iashin commented 8 months ago

i am dropping the support for it. thank you for your interest, feel free to use the latest commit it was live: bd827df in #112