Half precision error - Githubissues

sihan95 commented 6 years ago

When I executed EDSR baseline model (x2) training additionally with half precision option, I encountered errors as following. Could you give me any suggestions to avoid it?

python main.py --model EDSR --scale 2 --save EDSR_baseline_x2_half --reset --precision half Making model... Preparing loss function: 1.000 * L1 [Epoch 1] Learning rate: 1.00e-4 Skip this batch 2! (Loss: inf) Skip this batch 3! (Loss: inf) Skip this batch 4! (Loss: inf) Skip this batch 5! (Loss: inf) Skip this batch 6! (Loss: inf)

sanghyun-son commented 6 years ago

Hello.

We do not support half-precision training, since it is not that straightforward.

For example, ADAM optimizer does not work well with half-precision, and gradient accumulation & update are also inaccurate.

There are several papers handling low-precision training, so you can find them if you have to train the network with half-precision.

What you can do is half-precision test, which requires half memory without performance drop.

Also, in some advanced GPUs, half-precision is much faster than the single-precision.

Thank you.

sihan95 commented 6 years ago

Thanks for your detailed and fast response.

sanghyun-son / EDSR-PyTorch

Half precision error #36