jpsember / java-ml

Java classes for machine learning
0 stars 0 forks source link

NaN values appearing during training #52

Closed jpsember closed 2 years ago

jpsember commented 2 years ago
 File "/home/eio/js_dep/ml/example_yolo/loss_yolo.py", line 125, in forward
    giou = iou - ((container_area - union_area) / (container_area + EPSILON))

relevant?

https://benjamin-computer.medium.com/debugging-neural-networks-6fa65742efd

jpsember commented 2 years ago

"container_area" is getting huge:

container_area
[ 15157169 │         │ 229178130432 ]
[ 2266485303302684670 │ 883768048189277200000000000000000000 │ 339170274877550600000 ]
[   10131 │         │ 20721596552879414000000000 ]
jpsember commented 2 years ago

I put in a 'clamped exp' function, and it isn't crashing with the NaN detection, but the predicted box coordinates are still crazy high.

[         │ 97953120 │         ]
[         │ 97953120 │         ]
[         │ 97953120 │ 97953120 ]
container_area
[         │ 97953120 │         ]
[    9897 │ 97953120 │         ]
[         │ 97953120 │ 97953120 ]
pred_x1
[       1 │   -4949 │   -4949 ]
[       1 │   -4949 │         ]
[       1 │   -4948 │   -4949 ]
pred_x2
[       1 │    4949 │    4949 ]
[       1 │    4949 │         ]
[       1 │    4950 │    4949 ]
pred_y1
[         │   -4948 │         ]
[   -4948 │   -4949 │         ]
[         │   -4949 │   -4949 ]
pred_y2
[         │    4950 │         ]
[    4950 │    4949 │         ]
[         │    4949 │    4949 ]
union_area
[         │ 97953120 │         ]
[         │ 97953120 │         ]
[         │ 97953120 │ 97953120 ]
container_area
[         │ 97953120 │         ]
[    9897 │ 97953120 │         ]
[         │ 97953120 │ 97953120 ]
Epoch   84   Train Loss: 6.318 (5.195)