The problem of training custom datasets

trustguan commented 3 years ago

Thanks for your great work! When I train my custom datasets, I used the command: python ./train_net.py --num-gpus 1 --config-file ./configs/CenterNet2_R50_1x.yaml but I met the follow problems: `No instances! torch.Size([0, 3]) torch.Size([0, 4]) 4 No instance in box reg loss No instances! torch.Size([0, 3]) torch.Size([0, 4]) 4 No instance in box reg loss No instances! torch.Size([0, 3]) torch.Size([0, 4]) 4 No instance in box reg loss No instances! torch.Size([0, 3]) torch.Size([0, 4]) 4 No instance in box reg loss No instances! torch.Size([0, 3]) torch.Size([0, 4]) 4 No instance in box reg loss No instances! torch.Size([0, 3]) torch.Size([0, 4]) 4 No instance in box reg loss No instances! torch.Size([0, 3]) torch.Size([0, 4]) 4 No instance in box reg loss No instances! torch.Size([0, 3]) torch.Size([0, 4]) 4 No instance in box reg loss No instances! torch.Size([0, 3]) torch.Size([0, 4]) 4 No instance in box reg loss No instances! torch.Size([0, 3]) torch.Size([0, 4]) 4 No instance in box reg loss Traceback (most recent call last): File "./train_net.py", line 237, in launch( File "e:\pytorchpro\centernet2-master\detectron2\engine\launch.py", line 62, in launch main_func(args) File "./train_net.py", line 224, in main do_train(cfg, model, resume=args.resume) File "./train_net.py", line 128, in do_train loss_dict = model(data) File "D:\ProgramData\Anaconda3\envs\CenterNet2\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl result = self.forward(input, *kwargs) File "e:\pytorchpro\centernet2-master\detectron2\modeling\meta_arch\rcnn.py", line 160, in forward proposals, proposal_losses = self.proposal_generator(images, features, gt_instances) File "D:\ProgramData\Anaconda3\envs\CenterNet2\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl result = self.forward(input, **kwargs) File "E:\PytorchPro\CenterNet2-master\projects\CenterNet2\centernet\modeling\dense_heads\centernet.py", line 109, in forward losses = self.losses( File "E:\PytorchPro\CenterNet2-master\projects\CenterNet2\centernet\modeling\dense_heads\centernet.py", line 156, in losses assert (torch.isfinite(reg_pred).all().item()) AssertionError

How can I solve the problem? Thank you !`

xingyizhou commented 3 years ago

Hi, This just means the training diverged. If this happens in the first few iterations (e.g., < iteration 1000), you can try increasing the warmup iteration. Otherwise you can consider decreasing the learning rate, or change the normalization layers in the backbone to "SyncBN".

trustguan commented 3 years ago

Hi, This just means the training diverged. If this happens in the first few iterations (e.g., < iteration 1000), you can try increasing the warmup iteration. Otherwise you can consider decreasing the learning rate, or change the normalization layers in the backbone to "SyncBN".

thank you very much !

xingyizhou / CenterNet2

The problem of training custom datasets #37