All losses become NaN after about 1 epoch of training

wayveai / fiery

PyTorch code for the paper "FIERY: Future Instance Segmentation in Bird's-Eye view from Surround Monocular Cameras"

https://wayve.ai/blog/fiery-future-instance-prediction-birds-eye-view

MIT License

557 stars 85 forks source link

All losses become NaN after about 1 epoch of training #8

Closed jwookyoo closed 3 years ago

jwookyoo commented 3 years ago

Hi,

Thank you for sharing this great work!

When I ran the training code, I got NaN for all losses after about 1 epoch of training. This problem is reproduced whenever I run the training code. (I have tested it three times.)

I followed the same environment setting with anaconda, and also used the same hyper-parameters. (The only difference is that our PyTorch version is 1.7.1 and yours is 1.7.0, and all other modules are the same as yours.)

Please share your idea about this problem, if you have any. Thanks!

anthonyhu commented 3 years ago

Hey!

Interesting. Do you run into the same issue if you first load the weights of the encoder (from FIERY Static, the single-timeframe bird's-eye view model).

To do so, add the following lines in baseline.yml

PRETRAINED:
   LOAD_WEIGHTS: True
   PATH: './static_lift_splat_setting.ckpt'

jwookyoo commented 3 years ago

I loaded the weights first following your suggestion, and the training works now (without NaN)! Thanks a lot!!

jwookyoo commented 3 years ago

Can I ask one more question? :-) How can I train the FIERY Static weights from scratch?

anthonyhu commented 3 years ago

Of course. To train the FIERY Static from scratch, point the training script to the following config: https://github.com/wayveai/fiery/blob/master/fiery/configs/literature/static_lss_setting.yml

jwookyoo commented 3 years ago

I see. Thanks a lot!

anthonyhu commented 3 years ago

You're welcome!