nie-lang / DeepRectangling

CVPR2022 (Oral) - Deep Rectangling for Image Stitching: A Learning Baseline
233 stars 38 forks source link

NAN results after running train.py #8

Open KareemJBR opened 2 years ago

KareemJBR commented 2 years ago

Hi there,

I am trying to create my own model by running train.py file, but I get nan values for Global Loss, mask Loss and mesh Loss. If I let the program finish its work, I get real bad model, that if I run inference.py on it, I get the following results for both running on GPU and CPU:

PSNR: 10.4.. SSIM: 0.32..

I have tried changing the learning rate to a lower value but it still did not work for me, what could be the problem?

In addition, when running inference.py on the CPU I get decent results:

PSNR: something like 21 SSIM: 0.709..

But when I run the code on GPU I get terrible results, something like 10 and 0.3. Am I supposed to run the train.py on GPU and the inference on CPU? It is confusing because the GPU runs much faster but it delivers bad accuracy.

nie-lang commented 2 years ago

In my experiment, the model was trained and tested on a single GPU of 2080ti. And I have never met the NAN problem.

Maybe you should check your python environment. And I have given the complete environment requirements in issue 4. You can find it in the closed issues.