Winning Solution in NTIRE19 Challenges on Video Restoration and Enhancement (CVPR19 Workshops) - Video Restoration with Enhanced Deformable Convolutional Networks. EDVR has been merged into BasicSR and this repo is a mirror of BasicSR.
I have two questions regarding running the EDVR model.
I realized that everytime I submitted the job to a different gpu (even if they are of the same type e.g. Titan X), I have to do rm build/ and python setup.py develop again, otherwise I would get error in modulated_deformable_im2col_cuda; no kernel image is available for execution on the device. I suspect that it has something to do with dynamic installation? Right now, I had to keep three copies of the same repo in order to simultaneously run 3 jobs. Is it the way to go or is there any better options?
I followed one of the sugguestions in another posts to have pytorch 1.4, torchvision 0.5 with cudatoolit 10.1
I always get the following error and sometimes even explicit png CRC error when cv2.imdecode() returns None, and I realized that the training png's are somehow corrupted even though I verified all images before training. Did you encounter this problem before? Is it related to multi-processing data loading? This is happening everytime especially when I turn off the TSA and set frame to 1.
Traceback (most recent call last):
File "basicsr/train.py", line 252, in <module>
main()
File "basicsr/train.py", line 234, in main
train_data = prefetcher.next()
File "/scratch_net/biwidl216/huangsha/BasicSR_1/basicsr/data/prefetch_dataloader.py", line 76, in next
return next(self.loader)
File "/itet-stor/huangsha/net_scratch/conda_envs/test/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 345, in __next__
data = self._next_data()
File "/itet-stor/huangsha/net_scratch/conda_envs/test/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 838, in _next_data
return self._process_data(data)
File "/itet-stor/huangsha/net_scratch/conda_envs/test/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 881, in _process_data
data.reraise()
File "/itet-stor/huangsha/net_scratch/conda_envs/test/lib/python3.7/site-packages/torch/_utils.py", line 394, in reraise
raise self.exc_type(msg)
AttributeError: Caught AttributeError in DataLoader worker process 1.
Original Traceback (most recent call last):
File "/itet-stor/huangsha/net_scratch/conda_envs/test/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
data = fetcher.fetch(index)
File "/itet-stor/huangsha/net_scratch/conda_envs/test/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/itet-stor/huangsha/net_scratch/conda_envs/test/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/scratch_net/biwidl216/huangsha/BasicSR_1/basicsr/data/moving_cityscape_dataset.py", line 147, in __getitem__
img_gt = imfrombytes(img_bytes, float32=True)
File "/scratch_net/biwidl216/huangsha/BasicSR_1/basicsr/utils/img_util.py", line 125, in imfrombytes
img = img.astype(np.float32) / 255.
AttributeError: 'NoneType' object has no attribute 'astype'
/scratch/slurm/spool/job219938/slurm_script: line 31: 16770 Bus error python -u basicsr/train.py -opt options/train/EDVR/train_EDVR_DARK_20_frame_window_1_patch_64.yml
Thank you very much for your help :) and best wishes!
Dear @xinntao,
I have two questions regarding running the EDVR model.
I realized that everytime I submitted the job to a different gpu (even if they are of the same type e.g. Titan X), I have to do
rm build/
andpython setup.py develop
again, otherwise I would geterror in modulated_deformable_im2col_cuda; no kernel image is available for execution on the device
. I suspect that it has something to do with dynamic installation? Right now, I had to keep three copies of the same repo in order to simultaneously run 3 jobs. Is it the way to go or is there any better options? I followed one of the sugguestions in another posts to have pytorch 1.4, torchvision 0.5 with cudatoolit 10.1I always get the following error and sometimes even explicit png CRC error when
cv2.imdecode()
returns None, and I realized that the training png's are somehow corrupted even though I verified all images before training. Did you encounter this problem before? Is it related to multi-processing data loading? This is happening everytime especially when I turn off the TSA and set frame to 1./scratch/slurm/spool/job219938/slurm_script: line 31: 16770 Bus error python -u basicsr/train.py -opt options/train/EDVR/train_EDVR_DARK_20_frame_window_1_patch_64.yml