Open HongChow opened 1 year ago
loss of the first step and sometimes 3th step is valid but get inf or nan for more steps...
if i use cpu-tensorflow it works ok
@HongChow
training data : https://cv.snu.ac.kr/research/VDSR/train_data.zip keras=2.1.6 and tensorflow-gpu==1.5.0 but the loss is nan of inf such as : 1/1000 [..............................] - ETA: 48:13 - loss: 33339.6055 - PSNR: 5.4144 3/1000 [..............................] - ETA: 16:30 - loss: nan - PSNR: 4.5721 5/1000 [..............................] - ETA: 10:10 - loss: nan - PSNR: 4.0657 7/1000 [..............................] - ETA: 7:27 - loss: nan - PSNR: 3.8570 9/1000 [..............................] - ETA: 5:56 - loss: nan - PSNR: 3.9201 11/1000 [..............................] - ETA: 4:58 - loss: nan - PSNR: 3.9512 @yu4u could you please help with this?
I also encountered a similar problem, have you found a solution?
@HongChow
training data : https://cv.snu.ac.kr/research/VDSR/train_data.zip keras=2.1.6 and tensorflow-gpu==1.5.0 but the loss is nan of inf such as : 1/1000 [..............................] - ETA: 48:13 - loss: 33339.6055 - PSNR: 5.4144 3/1000 [..............................] - ETA: 16:30 - loss: nan - PSNR: 4.5721 5/1000 [..............................] - ETA: 10:10 - loss: nan - PSNR: 4.0657 7/1000 [..............................] - ETA: 7:27 - loss: nan - PSNR: 3.8570 9/1000 [..............................] - ETA: 5:56 - loss: nan - PSNR: 3.9201 11/1000 [..............................] - ETA: 4:58 - loss: nan - PSNR: 3.9512 @yu4u could you please help with this?
I also encountered a similar problem, have you found a solution?
@tangrc hi, it seems that upgrading to the latest pytorch fixs this problem.
@HongChow thanks!
@HongChow
training data : https://cv.snu.ac.kr/research/VDSR/train_data.zip keras=2.1.6 and tensorflow-gpu==1.5.0 but the loss is nan of inf such as : 1/1000 [..............................] - ETA: 48:13 - loss: 33339.6055 - PSNR: 5.4144 3/1000 [..............................] - ETA: 16:30 - loss: nan - PSNR: 4.5721 5/1000 [..............................] - ETA: 10:10 - loss: nan - PSNR: 4.0657 7/1000 [..............................] - ETA: 7:27 - loss: nan - PSNR: 3.8570 9/1000 [..............................] - ETA: 5:56 - loss: nan - PSNR: 3.9201 11/1000 [..............................] - ETA: 4:58 - loss: nan - PSNR: 3.9512 @yu4u could you please help with this?
I also encountered a similar problem, have you found a solution?
@tangrc hi, it seems that upgrading to the latest pytorch fixs this problem.
@HongChow hi, Regarding your reply, I have a little doubt. This project is based on the tf framework. Why do you say it is updated to the latest torch? thanks again!When I run, the loss is very large and the val loss is nan.
@HongChow
training data : https://cv.snu.ac.kr/research/VDSR/train_data.zip keras=2.1.6 and tensorflow-gpu==1.5.0 but the loss is nan of inf such as : 1/1000 [..............................] - ETA: 48:13 - loss: 33339.6055 - PSNR: 5.4144 3/1000 [..............................] - ETA: 16:30 - loss: nan - PSNR: 4.5721 5/1000 [..............................] - ETA: 10:10 - loss: nan - PSNR: 4.0657 7/1000 [..............................] - ETA: 7:27 - loss: nan - PSNR: 3.8570 9/1000 [..............................] - ETA: 5:56 - loss: nan - PSNR: 3.9201 11/1000 [..............................] - ETA: 4:58 - loss: nan - PSNR: 3.9512 @yu4u could you please help with this?
I also encountered a similar problem, have you found a solution?
@tangrc hi, it seems that upgrading to the latest pytorch fixs this problem.
@HongChow hi, Regarding your reply, I have a little doubt. This project is based on the tf framework. Why do you say it is updated to the latest torch? thanks again!When I run, the loss is very large and the val loss is nan.
oh, i am very sorry for this , i must made a mistake . it was another repo using pytorch , I upgraded to the lastest version and fixed the similar problem
training data : https://cv.snu.ac.kr/research/VDSR/train_data.zip keras=2.1.6 and tensorflow-gpu==1.5.0 but the loss is nan of inf such as : 1/1000 [..............................] - ETA: 48:13 - loss: 33339.6055 - PSNR: 5.4144 3/1000 [..............................] - ETA: 16:30 - loss: nan - PSNR: 4.5721
5/1000 [..............................] - ETA: 10:10 - loss: nan - PSNR: 4.0657 7/1000 [..............................] - ETA: 7:27 - loss: nan - PSNR: 3.8570 9/1000 [..............................] - ETA: 5:56 - loss: nan - PSNR: 3.9201 11/1000 [..............................] - ETA: 4:58 - loss: nan - PSNR: 3.9512 @yu4u could you please help with this?