Closed mmSir closed 3 years ago
+1
I did not observe a similar trend. Can you check if the inference/test script (test.py) is working fine? You may use the validation set from Vimeo for this purpose. This is how my training plots look.
Thanks for your reply. I trained your net from scratch, as Vimeo is too large, I reduce the Unet channel from [512,256,128,64] to [ 256,128,64,32] set batch size 32. But the loss seemed very strange for first 3 epoches.
could you please share your training loss?
Thank you very mush!
Can you share more details on the hardware specs? This is how my loss plots look like. Can you run the same code unchanged and see if loss behaves similarly? Also, are you using the septuplet split from here? http://toflow.csail.mit.edu/
When I used the full Vimeo data set, the batchSize was set to 32.A similar effect occurs, as shown in the figure below.I didn't change any of the parameters in the code you provided, so did you use any other Tricks in your training?The results of the training are very strange, and in theory I should get the same trend as yours.
thank you!
When I used the full Vimeo data set, the batchSize was set to 32.A similar effect occurs, as shown in the figure below.I didn't change any of the parameters in the code you provided, so did you use any other Tricks in your training?The results of the training are very strange, and in theory I should get the same trend as yours. thank you!
![]()
yes, my loss and psnr curve looks similar like yours ,it fluctuated greatly.
While I investigate into this further, can you try applying gradient clipping during training? You can use the code from here.
https://github.com/myungsub/CAIN/blob/fff8fc321c5a76904ed2a12c9500e055d4c77256/main.py#L117
Maye your PyTorch version >=1.6 , try PyTorch version ==1.5 or lower @mmSir
Sorry, because my device is 3090, I can only use torch 1.7.0 at present. Tarun005 and I both use torch 1.7.0. I also used a 1060,Torch 1.5.0 on a small data set. loss can also converge. Since I don't have any other equipment at present, and it's impossible to train the whole data set with a 1060, more tests can't be carried out. I recommend torch < = 1.5.0
your work is impressive!Hello, I'd like to ask you a few questions.When I download your code to training, I put the batchsize into 6, change data volume to around 20000, also use vimeo, but why I trained 70 several epoch, the loss on the training set and test set, PSNR and ssim are not convergence,lr at this time has dropped to a low value, so I think that also does not have the resolution to training necessary, the last PSNR is also less than 20, I wonder why I training result is so poor, and the index of the paper you far, can you give some advice?![image](https://user-images.githubusercontent.com/35763352/112708498-415ded00-8eed-11eb-8abe-6cbc74cc02e3.png)