Open jiandandan001 opened 4 years ago
The first thing that comes to mind is the packages versions... Please verify your packages are exactly as specified in requirements.txt If I am wrong, let me know and I will try to further help
Thanks. I have solved this problem via revising the 'util.py' as follows.
original: sio.savemat(os.path.join(conf.output_dir_path, '%s_kernel_x2.mat' % conf.img_name), {'Kernel': k_2})
revised: dirname1, filename1 = os.path.split(conf.img_name) sio.savemat(os.path.join(conf.output_dir_path, '%s_kernel_x2.mat' % filename1), {'Kernel': k_2})
I have a question about the speed. In my testing, the ZSSR module is very slow. It seems that this part is performed on CPU. How can run it on GPU?
ZSSR is not my code and it seems to run for me on GPU. It does take relatively long when it is provided with an estimated kernel (~3-4 minutes).
ZSSR is not my code and it seems to run for me on GPU. It does take relatively long when it is provided with an estimated kernel (~3-4 minutes).
Thank you for your quick reply. I will check the code further.
ZSSR is not my code and it seems to run for me on GPU. It does take relatively long when it is provided with an estimated kernel (~3-4 minutes).
Thank you for your quick reply. I will check the code further.
@jiandandan001 Were you able to figure out the reason for slow ZSSR training? I also encounter this issue. It takes more than 20 min to train one 350x460px image. It takes only 305Mb on V100 during training.
To learn the correct SR, ZSSR downscaled the image with it's kernel every iteration. When kernel is provided, it uses it to downscale the image for every iteration. When it is not provided, it uses a Bicubic downscaling which is a highly optimized Python code. That is the source to the difference in runtime. However - 20 min sounds WAY TOO LONG for that small image. I recall the difference is from ~30 seconds to 3-4 minutes on a V100 but I me be mistaking
@jiandandan001 @sefibk I solved the problem. Now x4 ZSSR runs ~5 min with a kernel provided. The problem was with compatibility with CUDA version compatibility. I installed tensorflow-gpu==2.1.0 which is compatible with cuda10.1. Hope it will help to @jiandandan001 as well.
@sefibk When I run ZSSR several times on the same image and the same estimated kernel separately from KernelGAN (I used both [k_2, k_4] gradual super-resolution and [k_4, k_4] direct x4 estimated SR), it outputs images of different quality and runs for a different amount of time each time. Have you also encountered such behavior? What can be the reason for that?
Yes. Since ZSSR trains from scratch on each image (as KernelGAN does), the randomness that exists in any deep learning training introduces slightly different results. To overcome this you can set a constant random seed, however, while it will solve inconsistency, it doesn't guarantee the best results. The variation in time is weird. It may be since some of the runs suffice the stopping criteria at different stages but I don't remember having that happen often and it usually runs for 3K iterations. Any way - this is not my work so I would suggest posting an issue in ZSSR's Git repo
@sefibk Ok, I see, thank you for your suggestions
Would it be convenient for you to provide the whole operation process? Thank you.
Thank you for sharing the code.
I meet the following problem. It seems that the kenel is not estimated successfully. Could you give me some suggestions?
G:\Anaconda\python.exe D:/2020/ReferenceCode/KernelGAN-master/train.py --input-dir test_images --real --SR Scale Factor: X2 ZSSR: True Real Image: True
STARTED KernelGAN on: "test_images\im_1.png"... 0%| | 0/3000 [00:00<?, ?it/s]G:\Anaconda\lib\site-packages\torch\nn\modules\loss.py:93: UserWarning: Using a target size (torch.Size([13])) that is different to the input size (torch.Size([1, 1, 13, 13])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size. return F.l1_loss(input, target, reduction=self.reduction) G:\Anaconda\lib\site-packages\torch\nn\modules\loss.py:93: UserWarning: Using a target size (torch.Size([])) that is different to the input size (torch.Size([1])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size. return F.l1_loss(input, target, reduction=self.reduction) G:\Anaconda\lib\site-packages\torch\nn\modules\loss.py:445: UserWarning: Using a target size (torch.Size([2])) that is different to the input size (torch.Size([2, 1])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size. return F.mse_loss(input, target, reduction=self.reduction) 100%|███████████████████| 3000/3000 [02:04<00:00, 24.18it/s] Traceback (most recent call last): File "G:\Anaconda\lib\site-packages\scipy\io\matlab\mio.py", line 39, in _open_file return open(file_like, mode), True FileNotFoundError: [Errno 2] No such file or directory: 'D:\2020\ReferenceCode\KernelGAN-master\results\test_images\im_1lll\test_images\im_1_kernel_x2.mat'
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "D:/2020/ReferenceCode/KernelGAN-master/train.py", line 54, in
main()
File "D:/2020/ReferenceCode/KernelGAN-master/train.py", line 36, in main
train(conf)
File "D:/2020/ReferenceCode/KernelGAN-master/train.py", line 18, in train
gan.finish()
File "D:\2020\ReferenceCode\KernelGAN-master\kernelGAN.py", line 124, in finish
save_final_kernel(final_kernel, self.conf)
File "D:\2020\ReferenceCode\KernelGAN-master\util.py", line 214, in save_final_kernel
sio.savemat(os.path.join(conf.output_dir_path, '%s_kernel_x2.mat' % conf.img_name), {'Kernel': k_2})
File "G:\Anaconda\lib\site-packages\scipy\io\matlab\mio.py", line 266, in savemat
with _open_file_context(file_name, appendmat, 'wb') as file_stream:
File "G:\Anaconda\lib\contextlib.py", line 113, in enter
return next(self.gen)
File "G:\Anaconda\lib\site-packages\scipy\io\matlab\mio.py", line 19, in _open_file_context
f, opened = _open_file(file_like, appendmat, mode)
File "G:\Anaconda\lib\site-packages\scipy\io\matlab\mio.py", line 45, in _open_file
return open(file_like, mode), True
FileNotFoundError: [Errno 2] No such file or directory: 'D:\2020\ReferenceCode\KernelGAN-master\results\test_images\im_1lll\test_images\im_1_kernel_x2.mat'