hustvl / MapTR

[ICLR'23 Spotlight] MapTR: Structured Modeling and Learning for Online Vectorized HD Map Construction
MIT License
984 stars 152 forks source link

About training NAN problem #82

Open imyyf opened 11 months ago

imyyf commented 11 months ago

https://github.com/hustvl/MapTR/blob/11307ed835dd7534ebb8927df9609f0b1b825aa6/projects/mmdet3d_plugin/maptr/assigners/maptr_assigner.py#L182 When we modify your model, we face the below error. ValueError: matrix contains invalid numeric entries We think it may be caused by cost value becomes NaN. Have you met the error? Do you have any ways to solve it? Thanks.

outsidercsy commented 11 months ago

Try 'cost = torch.nan_to_num(cost)'

xiaopan999 commented 11 months ago

Try 'cost = torch.nan_to_num(cost)'

It doesn't seem to work., 'grad_norm' becomes nan

imyyf commented 11 months ago

@xiaopan999 Have you met the same problem? Do you modify the raw maptr? We find that may caused by features value becomes NaN. maybe exploding gradient problem. But we don't find a good way to figure it. Modifying grad_norm_clip and lr may work.

Michel-liu commented 7 months ago

Any updates here? 👀

Michel-liu commented 7 months ago

It seems like the fp16 bug.

LordonCN commented 6 months ago

I met this error when train maptr_tiny_r50_24e_bevformer.py model

silverzdz commented 6 months ago

It seems like the fp16 bug.

It seems right. I comment the "fp16" line in the config file, and the NaN value did not diappear.