about the loss function

xinntao / EDVR

Winning Solution in NTIRE19 Challenges on Video Restoration and Enhancement (CVPR19 Workshops) - Video Restoration with Enhanced Deformable Convolutional Networks. EDVR has been merged into BasicSR and this repo is a mirror of BasicSR.

https://github.com/xinntao/BasicSR

1.49k stars 318 forks source link

about the loss function #40

Closed guiji0812 closed 5 years ago

guiji0812 commented 5 years ago

Hi, when I trained with the CharbonnierLoss , the loss is very very big, but when I trained with L1 loss, it is normal, what caused this phenomenon, could you give me some advice?

LI945 commented 5 years ago

when I rained with L1 loss, it was also big. How many is your L1 loss?

LI945 commented 5 years ago

All of three loss is big, what is the problem?

yinnhao commented 5 years ago

All of three loss is big, what is the problem?

Same to me. My loss is about "e+4". How about you?

LI945 commented 5 years ago

All of three loss is big, what is the problem?

Same to me. My loss is about "e+4". How about you

My loss is about "e+4" too

zzzzwj commented 5 years ago

All of three loss is big, what is the problem?

Same to me. My loss is about "e+4". How about you

My loss is about "e+4" too

Cause the loss is reduced by sum, and when you try to divide it by (GT_sizeGT_sizebatch_size) you'll find the loss is just the common case like 1e-2. You can replace the reduce function by 'mean'.

LI945 commented 5 years ago

I have another problem, which is that the loss function doesn't go down, does anybody else have this problem？

xinntao commented 5 years ago

@zzzzwj has pointed it out. CharbonnierLoss is in the sum mode. For L1/L2 losses, there are also some modes like mean and sum (you can see them in PyTorch doc). The key during the training is the gradient instead of the loss. So even if with larger losses, the training is OK under proper gradients. When using different modes, mean or sum, you may need to adjust the learning rate. But the Adam optimizer can automatically adjust it to some extent.

@LI945 During the training, you may observe the loss decreases very slowly. But if you evaluate the checkpoints, the performance (PSNR) actually increases as the training goes.

zzzzwj commented 5 years ago

I met the same problem as @LI945 mentioned. When I trained with my own datasets, the loss decreases very slow. When I train with SISR model (for example, EDSR), the psnr increases very fast which can reach almost the best value around 37.0 psnr in 20~30 epochs. However when I train with EDVR, using the raw training code, the psnr increases fast in first 10 epochs reaching ~33.0 psnr, then it's psnr value seems to be stable which means in next 20 epochs, the psnr value just inceases less than 1.0. So have you met the same problem when you train the REDS or Vimeo90K datasets? And can I have your training log? Hope for your reply @xinntao .

xinntao commented 5 years ago

@zzzzwj I will upload a training log example tomorrow. Actually, 1) we use a different training scheme with restarts, which improves the performance. 2) We usually measure in iteration rather than epoch.

zzzzwj commented 5 years ago

@zzzzwj I will upload a training log example tomorrow. Actually, 1) we use a different training scheme with restarts, which improves the performance. 2) We usually measure in iteration rather than epoch.

Well, thanks a lot.