sefibk / KernelGAN

Other
337 stars 77 forks source link

Unable to generate the .mat file of kernel #40

Open jiandandan001 opened 4 years ago

jiandandan001 commented 4 years ago

Thank you for sharing the code.

I meet the following problem. It seems that the kenel is not estimated successfully. Could you give me some suggestions?

G:\Anaconda\python.exe D:/2020/ReferenceCode/KernelGAN-master/train.py --input-dir test_images --real --SR Scale Factor: X2 ZSSR: True Real Image: True


STARTED KernelGAN on: "test_images\im_1.png"... 0%| | 0/3000 [00:00<?, ?it/s]G:\Anaconda\lib\site-packages\torch\nn\modules\loss.py:93: UserWarning: Using a target size (torch.Size([13])) that is different to the input size (torch.Size([1, 1, 13, 13])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size. return F.l1_loss(input, target, reduction=self.reduction) G:\Anaconda\lib\site-packages\torch\nn\modules\loss.py:93: UserWarning: Using a target size (torch.Size([])) that is different to the input size (torch.Size([1])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size. return F.l1_loss(input, target, reduction=self.reduction) G:\Anaconda\lib\site-packages\torch\nn\modules\loss.py:445: UserWarning: Using a target size (torch.Size([2])) that is different to the input size (torch.Size([2, 1])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size. return F.mse_loss(input, target, reduction=self.reduction) 100%|███████████████████| 3000/3000 [02:04<00:00, 24.18it/s] Traceback (most recent call last): File "G:\Anaconda\lib\site-packages\scipy\io\matlab\mio.py", line 39, in _open_file return open(file_like, mode), True FileNotFoundError: [Errno 2] No such file or directory: 'D:\2020\ReferenceCode\KernelGAN-master\results\test_images\im_1lll\test_images\im_1_kernel_x2.mat'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "D:/2020/ReferenceCode/KernelGAN-master/train.py", line 54, in main() File "D:/2020/ReferenceCode/KernelGAN-master/train.py", line 36, in main train(conf) File "D:/2020/ReferenceCode/KernelGAN-master/train.py", line 18, in train gan.finish() File "D:\2020\ReferenceCode\KernelGAN-master\kernelGAN.py", line 124, in finish save_final_kernel(final_kernel, self.conf) File "D:\2020\ReferenceCode\KernelGAN-master\util.py", line 214, in save_final_kernel sio.savemat(os.path.join(conf.output_dir_path, '%s_kernel_x2.mat' % conf.img_name), {'Kernel': k_2}) File "G:\Anaconda\lib\site-packages\scipy\io\matlab\mio.py", line 266, in savemat with _open_file_context(file_name, appendmat, 'wb') as file_stream: File "G:\Anaconda\lib\contextlib.py", line 113, in enter return next(self.gen) File "G:\Anaconda\lib\site-packages\scipy\io\matlab\mio.py", line 19, in _open_file_context f, opened = _open_file(file_like, appendmat, mode) File "G:\Anaconda\lib\site-packages\scipy\io\matlab\mio.py", line 45, in _open_file return open(file_like, mode), True FileNotFoundError: [Errno 2] No such file or directory: 'D:\2020\ReferenceCode\KernelGAN-master\results\test_images\im_1lll\test_images\im_1_kernel_x2.mat'

sefibk commented 4 years ago

The first thing that comes to mind is the packages versions... Please verify your packages are exactly as specified in requirements.txt If I am wrong, let me know and I will try to further help

jiandandan001 commented 3 years ago

Thanks. I have solved this problem via revising the 'util.py' as follows.

original: sio.savemat(os.path.join(conf.output_dir_path, '%s_kernel_x2.mat' % conf.img_name), {'Kernel': k_2})

revised: dirname1, filename1 = os.path.split(conf.img_name) sio.savemat(os.path.join(conf.output_dir_path, '%s_kernel_x2.mat' % filename1), {'Kernel': k_2})

jiandandan001 commented 3 years ago

I have a question about the speed. In my testing, the ZSSR module is very slow. It seems that this part is performed on CPU. How can run it on GPU?

sefibk commented 3 years ago

ZSSR is not my code and it seems to run for me on GPU. It does take relatively long when it is provided with an estimated kernel (~3-4 minutes).

jiandandan001 commented 3 years ago

ZSSR is not my code and it seems to run for me on GPU. It does take relatively long when it is provided with an estimated kernel (~3-4 minutes).

Thank you for your quick reply. I will check the code further.

yera217 commented 3 years ago

ZSSR is not my code and it seems to run for me on GPU. It does take relatively long when it is provided with an estimated kernel (~3-4 minutes).

Thank you for your quick reply. I will check the code further.

@jiandandan001 Were you able to figure out the reason for slow ZSSR training? I also encounter this issue. It takes more than 20 min to train one 350x460px image. It takes only 305Mb on V100 during training.

sefibk commented 3 years ago

To learn the correct SR, ZSSR downscaled the image with it's kernel every iteration. When kernel is provided, it uses it to downscale the image for every iteration. When it is not provided, it uses a Bicubic downscaling which is a highly optimized Python code. That is the source to the difference in runtime. However - 20 min sounds WAY TOO LONG for that small image. I recall the difference is from ~30 seconds to 3-4 minutes on a V100 but I me be mistaking

yera217 commented 3 years ago

@jiandandan001 @sefibk I solved the problem. Now x4 ZSSR runs ~5 min with a kernel provided. The problem was with compatibility with CUDA version compatibility. I installed tensorflow-gpu==2.1.0 which is compatible with cuda10.1. Hope it will help to @jiandandan001 as well.

yera217 commented 3 years ago

@sefibk When I run ZSSR several times on the same image and the same estimated kernel separately from KernelGAN (I used both [k_2, k_4] gradual super-resolution and [k_4, k_4] direct x4 estimated SR), it outputs images of different quality and runs for a different amount of time each time. Have you also encountered such behavior? What can be the reason for that?

sefibk commented 3 years ago

Yes. Since ZSSR trains from scratch on each image (as KernelGAN does), the randomness that exists in any deep learning training introduces slightly different results. To overcome this you can set a constant random seed, however, while it will solve inconsistency, it doesn't guarantee the best results. The variation in time is weird. It may be since some of the runs suffice the stopping criteria at different stages but I don't remember having that happen often and it usually runs for 3K iterations. Any way - this is not my work so I would suggest posting an issue in ZSSR's Git repo

yera217 commented 3 years ago

@sefibk Ok, I see, thank you for your suggestions

Lcjgh commented 1 year ago

Would it be convenient for you to provide the whole operation process? Thank you.