yzxing87 / Invertible-ISP

[CVPR2021] Invertible Image Signal Processing
MIT License
338 stars 39 forks source link

RuntimeError: CUDA error: an illegal memory access was encountered #1

Closed ggao33 closed 3 years ago

ggao33 commented 3 years ago

Hi, I am currently facing this issue below, when running train.py. Could you plz give me a hand? My pc env is under:

/home/anaconda3/bin/python /home/Documents/Invertible-ISP-main/train_cuda.py --task=debug --data_path=./data/ --gamma --aug --camera=NIKON_D700 --out_path=./exps/ --debug_mode Parsed arguments: Namespace(aug=True, batch_size=1, camera='NIKON_D700', data_path='./data/', debug_mode=True, gamma=True, loss='L1', lr=0.0001, out_path='./exps/', resume=False, rgb_weight=1, task='debug') [INFO] Start data loading and preprocessing [INFO] Start to train Traceback (most recent call last): File "/home/Documents/Invertible-ISP-main/train_cuda.py", line 99, in main(args) File "/home/Documents/Invertible-ISP-main/train_cuda.py", line 72, in main reconstruct_raw = net(reconstruct_rgb, rev=True) File "/home/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, *kwargs) File "/home/Documents/Invertible-ISP-main/model/model.py", line 176, in forward out = op.forward(out, rev) File "/home/Documents/Invertible-ISP-main/model/model.py", line 124, in forward self.s = self.clamp (torch.sigmoid(self.H(x1)) * 2 - 1) RuntimeError: CUDA error: an illegal memory access was encountered

Process finished with exit code 1

If switched to invertible-isp as your environment.yml said, the code somehow ghost stopped at line 22: DiffJPEG = DiffJPEG(differentiable=True, quality=90).cuda() without showing any errors nor printing "start to train"

yzxing87 commented 3 years ago

Hi, thanks for your interest.

Our provided enviornment.yml should work fine with CUDA 10.1. If you are using CUDA 11.1, please install pytorch 1.7.1. Please also note only some latest PyTorch versions (e.g. >1.7.0) works on CUDA 11 machines. Otherwise the program may get stuck.