hikvision-research / opera

A Unified Toolbox for Object Perception & Application
Apache License 2.0
149 stars 13 forks source link

Failing to resume training. #14

Open svenssona opened 2 years ago

svenssona commented 2 years ago

When using the checkpoints to resume training the model fails at start of training. It is able to do the first evaulation at the checkpoint provided. I've replicated this with both the Swin backbone and the ResNet50 backbone, with multiple different checkpoints. Seems like the heatmap predictions are missing? Training from scratch is working as intended. Anyone got some further insight into this problem?

epoch_runner(data_loaders[i], **kwargs) File "/work/home/richard/opera/opera/core/mlflow/mlflow_epoch_based_runner.py", line 23, in train super().train(*args, **kwargs) File "/work/home/richard/opera/third_party/mmcv/mmcv/runner/epoch_based_runner.py", line 53, in train self.run_iter(data_batch, train_mode=True, **kwargs) File "/work/home/richard/opera/opera/core/mlflow/mlflow_epoch_based_runner.py", line 17, in run_iter super().run_iter(data_batch, train_mode, **kwargs) File "/work/home/richard/opera/third_party/mmcv/mmcv/runner/epoch_based_runner.py", line 31, in run_iter outputs = self.model.train_step(data_batch, self.optimizer, File "/work/home/richard/opera/third_party/mmcv/mmcv/parallel/distributed.py", line 63, in train_step output = self.module.train_step(*inputs[0], **kwargs[0]) File "/work/home/richard/opera/third_party/mmdetection/mmdet/models/detectors/base.py", line 248, in train_step losses = self(**data) File "/work/home/richard/miniconda3/envs/opera/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/work/home/richard/opera/third_party/mmcv/mmcv/runner/fp16_utils.py", line 146, in new_func output = old_func(*new_args, **new_kwargs) File "/work/home/richard/opera/third_party/mmdetection/mmdet/models/detectors/base.py", line 172, in forward return self.forward_train(img, img_metas, **kwargs) File "/work/home/richard/opera/opera/models/detectors/petr.py", line 57, in forward_train losses = self.bbox_head.forward_train(x, img_metas, gt_bboxes, File "/work/home/richard/opera/opera/models/dense_heads/petr_head.py", line 439, in forward_train losses_and_targets = self.loss( File "/work/home/richard/opera/third_party/mmcv/mmcv/runner/fp16_utils.py", line 233, in new_func output = old_func(*new_args, **new_kwargs) File "/work/home/richard/opera/opera/models/dense_heads/petr_head.py", line 539, in loss hm_pred, hm_mask = enc_hm_proto TypeError: cannot unpack non-iterable NoneType object