Closed HyeongseokSon1 closed 4 years ago
I have added the test code for the result generation. Thank you.
I have added the test code for the result generation. Thank you.
Thank you for uploading the test code. However, I have still some issues with training and test codes. Please comment on these issues.
Test accuracy is much lower than valid accuracy during training. Visual results are also not sharp. For a single GPU model, the test PSNR is 29.83dB while the best validation PSNR is 30.73dB. I trained the model with the old source code.
Model performance (validation accuracies) seems to depend on the number of GPUs used in training. (used ddp mode)
Example logs)
1 GPU used 2020/09/17, 15:37:23 - [Epoch 500 / lr 2.50e-05] [train] epoch time: 401.73s, average batch time: 0.84s [train] 1MSE : 65.95 (best 62.06), PSNR : 31.84 (best 32.14) [valid] epoch time: 93.00s, average batch time: 0.37s [valid] 1MSE : 89.95 (best 80.37), PSNR : 30.22 (best 30.73)
2 GPUs used 2020/09/19, 18:02:29 - [Epoch 500 / lr 2.50e-05] [train] epoch time: 179.98s, average batch time: 0.75s [train] 1MSE : 71.04 (best 68.76), PSNR : 31.23 (best 31.40) [valid] epoch time: 36.07s, average batch time: 0.28s [valid] 1MSE : 95.21 (best 88.25), PSNR : 29.76 (best 30.21)
4 GPUs used 2020/09/18, 05:00:53 - [Epoch 500 / lr 2.50e-05] [train] epoch time: 103.80s, average batch time: 0.86s [train] 1MSE : 95.22 (best 85.12), PSNR : 30.19 (best 30.54) [valid] epoch time: 22.33s, average batch time: 0.35s [valid] 1MSE : 110.71 (best 99.94), PSNR : 29.06 (best 29.52)
As for issue #1: The PSNR values for our model and other models are all calculated through one round of random cropped (256x256) test dataloader (Not on full resolution video). What we want to compare is the deblurring efficiency of each model under same experimental conditions. Actually, if you want to get sharper results, you could increase the training epochs since the model is not converged at 500 epochs. And you can use larger model configuration, e.g., increasing "n_blocks" and "n_feats" as we did in the paper.
As for issue #2: If you use ddp mode, the real batch size is "num_gpus*batch_size", which means each process (GPU) will use the batch size you set in "para". Batch size will affect the performance of the model. In general, small batch size is better with the same training epochs, but I believe the effect of hyper-paramter "frames" is coupled to batch size. This also applies to other model like IFIRNN.
Thank you.
Thank you for the fast reply
For issue #2, I see your reply. However, for issue #1, my training setting is the same as the model (B9C80) in the paper, which is trained for 500 epoch and shows the PSNR 30.79 dB. Validation accuracy is similar to the paper's value although its accuracy is calculated from random crops. My test accuracy is much lower than the paper's value.
I think that there may be a problem with the test code. Could you provide the pretrained model or check again that the current test code is valid?
I didn't write the code in the test function to calculate PSNR since the validation and test in GOPRO is same. Could you please send me your code to calculate PSNR and your (B9C80) checkpoint? My email: zzh.tech@gmail.com Thank you
You mean that PSNR values in Table. 1 in the paper are validation accuracies, calculated from random crops? Thank you
Yes, it may seem strange. But, at 500 epochs, all the models are not converged, I have to choose one value to compare. So I chose the best one of each model. I run all the other models in the same way, from this perspective the comparison is fair. And when you run enough epochs (over 500), the current valid PSNR will be very close to best one. Sorry to confuse you.
OK I see, my question is settled. Thank you. I will close this issue.
Hello, It seems that there is only the training code. Can you provide the test code used for the evaluation in the paper?