tarun005 / FLAVR

Code for FLAVR: A fast and efficient frame interpolation technique.
Apache License 2.0
430 stars 69 forks source link

about the training tricks #15

Closed mmSir closed 3 years ago

mmSir commented 3 years ago

your work is impressive!Hello, I'd like to ask you a few questions.When I download your code to training, I put the batchsize into 6, change data volume to around 20000, also use vimeo, but why I trained 70 several epoch, the loss on the training set and test set, PSNR and ssim are not convergence,lr at this time has dropped to a low value, so I think that also does not have the resolution to training necessary, the last PSNR is also less than 20, I wonder why I training result is so poor, and the index of the paper you far, can you give some advice? image

gangsterless commented 3 years ago

+1

tarun005 commented 3 years ago

I did not observe a similar trend. Can you check if the inference/test script (test.py) is working fine? You may use the validation set from Vimeo for this purpose. This is how my training plots look.

image
gangsterless commented 3 years ago

Thanks for your reply. I trained your net from scratch, as Vimeo is too large, I reduce the Unet channel from [512,256,128,64] to [ 256,128,64,32] set batch size 32. But the loss seemed very strange for first 3 epoches. QQ图片20210329101538 could you please share your training loss? Thank you very mush!

tarun005 commented 3 years ago

Can you share more details on the hardware specs? This is how my loss plots look like. Can you run the same code unchanged and see if loss behaves similarly? Also, are you using the septuplet split from here? http://toflow.csail.mit.edu/

image
mmSir commented 3 years ago

When I used the full Vimeo data set, the batchSize was set to 32.A similar effect occurs, as shown in the figure below.I didn't change any of the parameters in the code you provided, so did you use any other Tricks in your training?The results of the training are very strange, and in theory I should get the same trend as yours. thank you! image image

gangsterless commented 3 years ago

When I used the full Vimeo data set, the batchSize was set to 32.A similar effect occurs, as shown in the figure below.I didn't change any of the parameters in the code you provided, so did you use any other Tricks in your training?The results of the training are very strange, and in theory I should get the same trend as yours. thank you! image image

yes, my loss and psnr curve looks similar like yours ,it fluctuated greatly.

tarun005 commented 3 years ago

While I investigate into this further, can you try applying gradient clipping during training? You can use the code from here.

https://github.com/myungsub/CAIN/blob/fff8fc321c5a76904ed2a12c9500e055d4c77256/main.py#L117

hiredd commented 3 years ago

Maye your PyTorch version >=1.6 , try PyTorch version ==1.5 or lower @mmSir

gangsterless commented 3 years ago

Sorry, because my device is 3090, I can only use torch 1.7.0 at present. Tarun005 and I both use torch 1.7.0. I also used a 1060,Torch 1.5.0 on a small data set. loss can also converge. Since I don't have any other equipment at present, and it's impossible to train the whole data set with a 1060, more tests can't be carried out. I recommend torch < = 1.5.0