Closed mwk0423 closed 5 years ago
I have also encountered this problem. After I changed the dir_data to the folder location of my dataset , the problem was solved. parser.add_argument('--dir_data', type=str, default='../../dataset', help='dataset directory')
You are right. i change my folder location to '../../dataset'(before it was '../.../dataset/DIV2K') and it training normally... Except for some extra memory errors -_-! such as
Preparing loss function: 1.000 * L1 [Epoch 1] Learning rate: 1.00e-4 [80/800] [L1: 27.2510] 15.3+0.3s [160/800] [L1: 20.0985] 14.6+0.2s [240/800] [L1: 16.4675] 14.7+0.2s [320/800] [L1: 14.4444] 14.7+0.2s [400/800] [L1: 12.9677] 14.7+0.2s [480/800] [L1: 11.9202] 14.8+0.1s [560/800] [L1: 11.2859] 14.8+0.2s [640/800] [L1: 10.9254] 14.8+0.2s [720/800] [L1: 10.4220] 14.9+0.2s [800/800] [L1: 9.9505] 14.9+0.2s
Evaluation: 10%|████▍ | 1/10 [00:04<00:41, 4.65s/it]THCudaCheck FAIL file=c:\users\admi nistrator\downloads\new-builder\win-wheel\pytorch\aten\src\thc\generic/THCStorage.cu line=58 error=2 : out of memory
Traceback (most recent call last): File "main.py", line 26, in
t.test() File "C:\Users\Administrator\Desktop\SR\EDSR-PyTorch-master\src\trainer.py", line 93, in test sr = self.model(lr, idx_scale) File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\site-packages\torch\nn\modules\module.py", line 491, in call result = self.forward(*input, kwargs) File "C:\Users\Administrator\Desktop\SR\EDSR-PyTorch-master\src\model__init.py", line 53, in forward return self.model(x) File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\site-packages\torch\nn\modules\module.py", line 491, in call result = self.forward(*input, **kwargs) File "C:\Users\Administrator\Desktop\SR\EDSR-PyTorch-master\src\model\edsr.py", line 58, in forward x = self.tail(res) File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\site-packages\torch\nn\modules\module.py", line 491, in call__ result = self.forward(*input, kwargs) File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\site-packages\torch\nn\modules\container.p y", line 91, in forward input = module(input) File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\site-packages\torch\nn\modules\module.py", line 491, in call result = self.forward(*input, *kwargs) File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\site-packages\torch\nn\modules\container.p y", line 91, in forward input = module(input) File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\site-packages\torch\nn\modules\module.py", line 491, in call result = self.forward(input, **kwargs) File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\site-packages\torch\nn\modules\pixelshuffl e.py", line 40, in forward return F.pixel_shuffle(input, self.upscale_factor) File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\site-packages\torch\nn\functional.py", lin e 1662, in pixel_shuffle shuffle_out = input_view.permute(0, 1, 4, 2, 5, 3).contiguous() RuntimeError: cuda runtime error (2) : out of memory at c:\users\administrator\downloads\new-builder\win-wheel\pytorch\a ten\src\thc\generic/THCStorage.cu:58
I think it probability that some parameters are set incorrectly would you like to tell me something about this.
maybe your gpu memory isn't enough and you can try change n_resblocks to 8 (even smaller) or change dataset to set14(the size of images in set14 is smaller and need less GPU memory )
I modified some parameters and it effective thanks a lot. @YongboLiang
Excuse me, I referred to the above conversation to modify my code, but still haven't solved the problem of the PSNR value being nan. I think the model still hasn't read the dataset. I would be very grateful if there have any suggestions about my problem.
(F:\yolov7) C:\Users\Lenovo\Desktop\Non-Local-Sparse-Attention-main\src>python main.py --dir_data "../../datasets" --n_GPUs 1 --rgb_range 1 --chunk_size 144 --n_hashes 4 --save_models --lr 1e-4 --decay 200-400-600-800 --epochs 1000 --c hop --save_results --n_resblocks 32 --n_feats 256 --res_scale 0.1 --batch_size 16 --model NLSN --scale 4 --patch_size 96 --save NLSN_x4 --data_train DIV2K
hello: i'm new hand in SR.The operating system I am used win10 and CUDA v8.0. i notice that the algorithm maybe run in Linux, therefore i install Cygwin so that can execute linux command in my copmuter. when i execute the algorithm follow by ‘README’ ,there some error about"Device or resource busy",then i reference answer @kice in #50 and alter my “dataloader.py”. It can run but anoter problem emerge.I posted some of the results below.
The target of PSNR always be nan... and training model was not saved(only some parameter files saved).
I would be very grateful if there have any suggestions about my problem.