zhanglu-cst / HIFSOD

Official codes and datasets of HIFSOD
Apache License 2.0
6 stars 1 forks source link

ENABLE_HICL设为true时会在第二轮迭代的时候报错 #2

Open CatherineYun opened 3 months ago

CatherineYun commented 3 months ago
Exception has occurred: FloatingPointError
Loss became infinite or NaN at iteration=2!
loss_dict = {'loss_clssifiers': tensor(1.8661, device='cuda:0', grad_fn=), 'loss_root_box_reg': tensor(0.1625, device='cuda:0', grad_fn=), 'loss_root_cls': tensor(2.1692, device='cuda:0', grad_fn=), 'hicl_loss': tensor(nan, device='cuda:0', grad_fn=), 'loss_rpn_cls': tensor(4.3721e-05, device='cuda:0'), 'loss_rpn_loc': tensor(0.0015, device='cuda:0')}
File "/mnt/lustre/Katherine/HIFSOD/fsdet/engine/train_loop.py", line 241, in _detect_anomaly
raise FloatingPointError(
File "/mnt/lustre/Katherine/HIFSOD/fsdet/engine/train_loop.py", line 220, in run_step
self._detect_anomaly(losses, loss_dict)
File "/mnt/lustre/Katherine/HIFSOD/fsdet/engine/train_loop.py", line 133, in train
self.run_step()
File "/mnt/lustre/Katherine/HIFSOD/fsdet/engine/defaults.py", line 397, in train
super().train(self.start_iter, self.max_iter)
File "/mnt/lustre/Katherine/HIFSOD/tools/train_net.py", line 119, in main
return trainer.train()
File "/mnt/lustre/Katherine/HIFSOD/fsdet/engine/launch.py", line 52, in launch
main_func(*args)
File "/mnt/lustre/Katherine/HIFSOD/tools/train_net.py", line 125, in
launch(
FloatingPointError: Loss became infinite or NaN at iteration=2!
loss_dict = {'loss_clssifiers': tensor(1.8661, device='cuda:0', grad_fn=), 'loss_root_box_reg': tensor(0.1625, device='cuda:0', grad_fn=), 'loss_root_cls': tensor(2.1692, device='cuda:0', grad_fn=), 'hicl_loss': tensor(nan, device='cuda:0', grad_fn=), 'loss_rpn_cls': tensor(4.3721e-05, device='cuda:0'), 'loss_rpn_loc': tensor(0.0015, device='cuda:0')}

进行base_hierarchical,训练时,ENABLE_HICL设为true时,会在第二轮迭代的时候报错