Closed zxk19981227 closed 1 year ago
Also when i tried to use different scale, for example, with no output scale, the model tends to get nan value at 12 iteration. So any helpful solutions?
Also what does the function : check_image_size defined for ? i tried to train the model but failed several times by this functions?
Please attach the config you were using. Loss becomes NaN normally is because you set a too large learning rate. check_image_size you may need to check detectron2, I remember it checks if the image size (tensor shape) matches what is defined in the dataset dict.
Please attach the config you were using. Loss becomes NaN normally is because you set a too large learning rate. check_image_size you may need to check detectron2, I remember it checks if the image size (tensor shape) matches what is defined in the dataset dict.
The config file is Base-CenterNet.yaml
You could try set cfg.INPUT.GT_SCALE_AWARE to False.
when i trained this model, question raised that : ` -- Process 1 terminated with the following error: Traceback (most recent call last): File "/home/zhouxukun/miniconda3/envs/maskrcnn/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap fn(i, args) File "/data1/zhouxukun/fcsgg/detectron2/detectron2/engine/launch.py", line 94, in _distributed_worker main_func(args) File "/data1/zhouxukun/fcsgg/tools/train_net.py", line 148, in main return trainer.train() File "/data1/zhouxukun/fcsgg/detectron2/detectron2/engine/defaults.py", line 410, in train super().train(self.start_iter, self.max_iter) File "/data1/zhouxukun/fcsgg/detectron2/detectron2/engine/train_loop.py", line 142, in train self.run_step() File "/data1/zhouxukun/fcsgg/detectron2/detectron2/engine/train_loop.py", line 235, in run_step loss_dict = self.model(data) File "/home/zhouxukun/miniconda3/envs/maskrcnn/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, kwargs) File "/home/zhouxukun/miniconda3/envs/maskrcnn/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 886, in forward output = self.module(*inputs[0], *kwargs[0]) File "/home/zhouxukun/miniconda3/envs/maskrcnn/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(input, kwargs) File "/data1/zhouxukun/fcsgg/fcsgg/modeling/meta_arch/onestage_detector.py", line 297, in forward self.preprocess_gt(gt_scene_graphs, images.tensor.shape[-2:], image_ids) File "/data1/zhouxukun/fcsgg/fcsgg/modeling/meta_arch/onestage_detector.py", line 261, in preprocess_gt gt_scene_graphs[i] = self.gt_gen(x, image_size, image_id, training=self.training) File "/data1/zhouxukun/fcsgg/fcsgg/data/detection_utils.py", line 639, in call training=training) File "/data1/zhouxukun/fcsgg/fcsgg/data/detection_utils.py", line 540, in generate_gt_scale range_side = torch.tensor(size_range) RuntimeError: Could not infer dtype of NoneType
` How to solve it?