I change the backbone to PeleeNet and train with 4 gpu.
But feat_id will have some elements are nan and cause the pooled feature lost some feature and dimension mismatch to label when calculate loss function.
It is because the propose rois has x1>x2 or y1>y2 which cause the w<0 or h<0.
np.log2(negative number ) cause nan.
I have tried smaller learning rate. 0.0025 or 0.00125. But it still happen.
Do anyone know how to solve this problem?
Thanks!
When I ran into this, I introduced normalization along with a smaller learning rate, and things stopped going to nan. I don't know if that will fix your issue, but it might be worth a shot.
I change the backbone to PeleeNet and train with 4 gpu. But feat_id will have some elements are nan and cause the pooled feature lost some feature and dimension mismatch to label when calculate loss function.
https://github.com/uber-research/UPSNet/blob/3218581a623b02a73c3334b672fc1ce0c25fdae9/upsnet/operators/modules/fpn_roi_align.py#L38
It is because the propose rois has x1>x2 or y1>y2 which cause the w<0 or h<0. np.log2(negative number ) cause nan. I have tried smaller learning rate. 0.0025 or 0.00125. But it still happen. Do anyone know how to solve this problem? Thanks!