RuntimeError: cublas runtime error : the GPU program failed to execute at C:/ProgramData/Miniconda3/conda-bld/pytorch_1533090623466/work/aten/src/THC/THCBlas.cu:411

giuseppecialdella commented 1 year ago

Hi, I collected some data for training with Airsim and I got an error.

I created a new dataset loader (inherited from monodataset), which is TEST_DATASET.PY import numpy as np import PIL.Image as pil import os from .mono_dataset import MonoDataset

class CustomDataset(MonoDataset): def init(self, *args, *kwargs): super(CustomDataset, self).init(args, **kwargs)

    # got from collision avoidance project
    self.K = np.array([[self.width / 2, 0, self.width / 2],
                       [0, self.width / 2, self.height / 2],
                       [0, 0, 1]], dtype=np.float32)
    self.full_res_shape = (640, 192)

def check_depth(self):
    return False

def get_image_path(self, folder, frame_index, side):
    print('frame_index: ', frame_index)
    f_str = "{}{}{}".format("frame", frame_index, self.img_ext)
    image_path = os.path.join(self.data_path, folder, f_str)

    return image_path

def get_color(self, folder, frame_index, side, do_flip):
    color = self.loader(self.get_image_path(folder, frame_index, side).replace("\\", "/"))
    if do_flip:
        color = color.transpose(pil.FLIP_LEFT_RIGHT)
    return color

I want to resolve a monocular problem, so i have this format for my frames: /Users/xx/OneDrive/Documenti/AirSim/image_sequences/sequence10 5 l

Please help me resolve this issue. Many thanks in advance. My env are: python: 3.6 pytorch: 0.4.1 cuda90 cudatoolkit: 10.0.130 cudnn 7.6.0 I have a RTX 2070 SUPER on my pc.

The error message is (when i launch --> python train.py --model_name mono_640x192 --png --num_workers 0 --batch_size 2 --height 192 --width 640) (NOTE THAT TRAIN FILES AND VAL FILES HAVE THE SAME NUMBER OF ELEMENTS ONLY TO TEST IF MONODEPTH IS WORKING) Training model named: mono_640x192 Models and tensorboard events files are saved to: C:\Users\xx\tmp Training is using: cuda Using split: custom_split There are 27650 training items and 27650 validation items

Training frame_index: 152 frame_index: 151 frame_index: 153 Traceback (most recent call last): File "train.py", line 21, in trainer.train() File "C:\Users\ciald\OneDrive\Desktop\Tirocinio - CNN per CA\monodepth2\trainer.py", line 191, in train self.run_epoch() File "C:\Users\ciald\OneDrive\Desktop\Tirocinio - CNN per CA\monodepth2\trainer.py", line 203, in run_epoch File "C:\Users\ciald\anaconda3\envs\monodepth_env\lib\site-packages\torch\utils\data\dataloader.py", line 314, in next batch = self.collate_fn([self.dataset[i] for i in indices]) File "C:\Users\ciald\anaconda3\envs\monodepth_env\lib\site-packages\torch\utils\data\dataloader.py", line 314, in batch = self.collate_fn([self.dataset[i] for i in indices]) File "C:\Users\ciald\OneDrive\Desktop\Tirocinio - CNN per CA\monodepth2\datasets\mono_dataset.py", line 162, in getitem inputs[("color", i, -1)] = self.get_color(folder, frame_index + i, side, do_flip) File "C:\Users\ciald\OneDrive\Desktop\Tirocinio - CNN per CA\monodepth2\datasets\test_dataset.py", line 28, in get_color color = self.loader(self.get_image_path(folder, frame_index, side).replace("\", "/")) File "C:\Users\ciald\OneDrive\Desktop\Tirocinio - CNN per CA\monodepth2\datasets\mono_dataset.py", line 24, in pil_loader with open(path, 'rb') as f: FileNotFoundError: [Errno 2] No such file or directory: 'C:/Users/ciald/OneDrive/Documenti/AirSim/image_sequences/sequence92/frame153.png' PS C:\Users\ciald\OneDrive\Desktop\Tirocinio - CNN per CA\monodepth2> python train.py --model_name mono_640x192 --png --num_workers 0 --batch_size 2 --height 192 --width 640 Training model named: mono_640x192 Models and tensorboard events files are saved to: C:\Users\ciald\tmp Training is using: cuda Using split: custom_split There are 27650 training items and 27650 validation items

Training frame_index: 220 frame_index: 219 frame_index: 221 frame_index: 36 frame_index: 35 frame_index: 37 Traceback (most recent call last): File "train.py", line 21, in trainer.train() File "C:\Users\ciald\OneDrive\Desktop\Tirocinio - CNN per CA\monodepth2\trainer.py", line 191, in train self.run_epoch() File "C:\Users\ciald\OneDrive\Desktop\Tirocinio - CNN per CA\monodepth2\trainer.py", line 207, in run_epoch outputs, losses = self.process_batch(inputs) File "C:\Users\ciald\OneDrive\Desktop\Tirocinio - CNN per CA\monodepth2\trainer.py", line 257, in process_batch outputs.update(self.predict_poses(inputs, features)) File "C:\Users\ciald\OneDrive\Desktop\Tirocinio - CNN per CA\monodepth2\trainer.py", line 297, in predict_poses axisangle[:, 0], translation[:, 0], invert=(f_i < 0)) File "C:\Users\ciald\OneDrive\Desktop\Tirocinio - CNN per CA\monodepth2\layers.py", line 41, in transformation_from_parameters M = torch.matmul(R, T) RuntimeError: cublas runtime error : the GPU program failed to execute at C:/ProgramData/Miniconda3/conda-bld/pytorch_1533090623466/work/aten/src/THC/THCBlas.cu:411

Thank you in advance, hope you can help me fix this.

anona-R commented 1 year ago

Hi, I got the same issue. Were you able to fix this issue?

anona-R commented 1 year ago

Was able to resolve the issue by just upgrading torch to 1.7.1

Ravenclaw-Hcmut commented 1 year ago

Was able to resolve the issue by just upgrading torch to 1.7.1

Hi. I'm dealing with the same problem.

Does torch 1.7.1 affect your final model quality? Can you share the version of torchvision, tensorboardx, cuda and cudnn?

Thank you.

daniyar-niantic commented 1 year ago

Hi, We did not test it on Windows machine. From the error messages that you posted I can suggests two things to check:

Does this file exist: C:/Users/ciald/OneDrive/Documenti/AirSim/image_sequences/sequence92/frame153.png? If not, why is it being loaded?
Update you cublas library and make sure it is compatible with your driver, device, cuda and cudnn versions.

nianticlabs / monodepth2

RuntimeError: cublas runtime error : the GPU program failed to execute at C:/ProgramData/Miniconda3/conda-bld/pytorch_1533090623466/work/aten/src/THC/THCBlas.cu:411 #452