uber-research / UPSNet

UPSNet: A Unified Panoptic Segmentation Network
Other
649 stars 120 forks source link

rois output cause feat_id has nan element #57

Open andyhahaha opened 5 years ago

andyhahaha commented 5 years ago

I change the backbone to PeleeNet and train with 4 gpu. But feat_id will have some elements are nan and cause the pooled feature lost some feature and dimension mismatch to label when calculate loss function.

https://github.com/uber-research/UPSNet/blob/3218581a623b02a73c3334b672fc1ce0c25fdae9/upsnet/operators/modules/fpn_roi_align.py#L38

It is because the propose rois has x1>x2 or y1>y2 which cause the w<0 or h<0. np.log2(negative number ) cause nan. I have tried smaller learning rate. 0.0025 or 0.00125. But it still happen. Do anyone know how to solve this problem? Thanks!

rlangefe commented 3 years ago

When I ran into this, I introduced normalization along with a smaller learning rate, and things stopped going to nan. I don't know if that will fix your issue, but it might be worth a shot.