train interrupt in the end of the first epoch

mit-han-lab / bevfusion

[ICRA'23] BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation

Apache License 2.0

2.35k stars 423 forks source link

Hi @Surtr07,

Sorry for the delayed response. I was busy working on other projects recently. Judging from the snapshots you provided, it does not seem to me that you ran into a potentially OOM problem, and it seems that the GPUs still have high occupancies. You may add a -v flag to your command and see if there are any error messages shown on the screen when the program freezes. Besides, you could also try skipping the evaluation on validation set by setting this parameter to a very large value. Let me know if you have further information.

Best, Haotian

mit-han-lab / bevfusion

train interrupt in the end of the first epoch #381