RuntimeError: stack expects a non-empty TensorList

cccccccccy commented 3 years ago

Hi, I preprocessed COCO2017 dataset with python datasets/register_coco_edge.py. But when I trained this network with python train_net.py --num-gpus 1 --config-file configs/Dance_R_50_3x.yaml , I still faced a problem which said: 'ERROR [05/08 10:49:58 d2.engine.train_loop]: Exception during training: Traceback (most recent call last): File "/home/caoyang/detectron2/detectron2/engine/train_loop.py", line 132, in train self.run_step() File "/home/caoyang/detectron2/detectron2/engine/train_loop.py", line 214, in run_step loss_dict = self.model(data) File "/home/caoyang/anaconda3/envs/dance/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, kwargs) File "/home/caoyang/dance/core/modeling/edge_snake/dance.py", line 140, in forward features, proposals, (gt_sem_seg, [gt_instances, images.image_sizes]) File "/home/caoyang/anaconda3/envs/dance/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, *kwargs) File "/home/caoyang/dance/core/modeling/edge_snake/edgedet.py", line 270, in forward , poly_loss = self.refine_head(snake_input, None, targets[1]) File "/home/caoyang/anaconda3/envs/dance/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(input, kwargs) File "/home/caoyang/dance/core/modeling/edge_snake/snake_head.py", line 1881, in forward training_targets = self.compute_targets_for_polys(gt_instances, image_sizes) File "/home/caoyang/dance/core/modeling/edge_snake/snake_head.py", line 1232, in compute_targets_for_polys init_ex_targets = torch.stack(init_ex_targets, dim=0) RuntimeError: stack expects a non-empty TensorList' I guess the reason is that there is no target in the picture, or the target is not marked. And would you like to tell me how to solve this problem.

lkevinzc commented 3 years ago

Hi @cccccccccy, when did this error happen? Is it almost immediately after you launch the training script or after a while? If it happened after a while, did it happen almost after the same period of time after launching?

cccccccccy commented 3 years ago

Hi , this error occurs in the first epoch at the same time everytime. [05/11 15:22:02 d2.engine.hooks]: Overall training speed: 1796 iterations in 0:13:25 (0.4488 s / it) [05/11 15:22:02 d2.engine.hooks]: Total training time: 0:13:28 (0:00:02 on hooks)

cccccccccy commented 3 years ago

And I also have another problem. I link the snake module in dance model (without att module, using the same preprocessing and loss design) to my initial contour prediction model. There is a problem that my initial contour will be closer to the groundtruth compared to dance's initial contour, so the snake loss (0,1,2) value is very small (about 0.01-0.02), so would you like to tell me how to change to to train at this situation?

lkevinzc commented 3 years ago

Hi , this error occurs in the first epoch at the same time everytime. [05/11 15:22:02 d2.engine.hooks]: Overall training speed: 1796 iterations in 0:13:25 (0.4488 s / it) [05/11 15:22:02 d2.engine.hooks]: Total training time: 0:13:28 (0:00:02 on hooks)

If this is the case, it is likely certain image has bad annotation. Need to identify it and filter it out.

lkevinzc commented 3 years ago

And I also have another problem. I link the snake module in dance model (without att module, using the same preprocessing and loss design) to my initial contour prediction model. There is a problem that my initial contour will be closer to the groundtruth compared to dance's initial contour, so the snake loss (0,1,2) value is very small (about 0.01-0.02), so would you like to tell me how to change to to train at this situation?

It seems that your initial contour prediction is better than directly sampling on the box. This makes the later refinement (snake predicting offsets) easier thus obtaining very small loss. A simple solution is to have a coefficient to magnify the snake loss.

Additional suggestion could be that you can visualise your initial contour prediction and the ground truth offset that you hope snake module to learn. And then do some analysis on what kind of loss is suitable to learn this offset, how to do proper re-scaling to balance the losses, etc.

cccccccccy commented 3 years ago

Thanks for your suggestiones !

lkevinzc / dance

RuntimeError: stack expects a non-empty TensorList #6