InvalidArgumentError (see above for traceback): LossTensor is inf or nan : Tensor had NaN values [[Node: train_op/CheckNumerics = CheckNumerics[T=DT_FLOAT, message="LossTensor is inf or nan", _device="/job:localhost/replica:0/task:0/device:CPU:0"](total_loss)]]

hizhangp / yolo_tensorflow

Tensorflow implementation of YOLO, including training and test phase.

MIT License

795 stars 442 forks source link

InvalidArgumentError (see above for traceback): LossTensor is inf or nan : Tensor had NaN values [[Node: train_op/CheckNumerics = CheckNumerics[T=DT_FLOAT, message="LossTensor is inf or nan", _device="/job:localhost/replica:0/task:0/device:CPU:0"](total_loss)]] #89

Open metaStor opened 5 years ago

metaStor commented 5 years ago

Environment: tensorflow-gpu 1.9.0 + cuda9.0

ruyanyinian commented 5 years ago

Environment: tensorflow-gpu 1.9.0 + cuda9.0

I think it has nothing to do with cpu/gpu, it has something to do with your dataset. If you run the first batch of dataset and "Tensor loss is Nan" appears, it indicates that your original dataset fluctuate dramatically which leads to pixel to be infinity, otherwise try to decrease your learning rate, and increase your batchsize

metaStor commented 4 years ago

@ruyanyinian I see. I'll give it a try. Thanks!