open-mmlab / mmdetection

OpenMMLab Detection Toolbox and Benchmark
https://mmdetection.readthedocs.io
Apache License 2.0
29.6k stars 9.46k forks source link

Questions about Pointpillars training model mismatch and incorrect test results #11026

Closed deyang2000 closed 1 year ago

deyang2000 commented 1 year ago

eb425ace71117e15ec649dd6417c375 8a8e4fcee5068b782f3b929fcf44ce5

As shown in the figure, I show a loss function image of my training process and a test image of my final training result. Here's what I got back when I tested my training results:

(openmmlab) liyf@l526-System-Product-Name:~/mmdetection3d$ python demo/pcd_demo.py demo/data/kitti/000008.bin pointpillars_hv_secfpn_8xb6-160e_kitti-3d-car.py "/home/liyf/epoch_80.pth" --show /home/liyf/mmdetection3d/mmdet3d/models/dense_heads/anchor3d_head.py:94: UserWarning: dir_offset and dir_limit_offset will be depressed and be incorporated into box coder in the future warnings.warn( Loads checkpoint by local backend from path: /home/liyf/epoch_80.pth The model and loaded state dict do not match exactly

size mismatch for bbox_head.conv_cls.weight: copying a param with shape torch.Size([18, 384, 1, 1]) from checkpoint, the shape in current model is torch.Size([2, 384, 1, 1]). size mismatch for bbox_head.conv_cls.bias: copying a param with shape torch.Size([18]) from checkpoint, the shape in current model is torch.Size([2]). size mismatch for bbox_head.conv_reg.weight: copying a param with shape torch.Size([42, 384, 1, 1]) from checkpoint, the shape in current model is torch.Size([14, 384, 1, 1]). size mismatch for bbox_head.conv_reg.bias: copying a param with shape torch.Size([42]) from checkpoint, the shape in current model is torch.Size([14]). size mismatch for bbox_head.conv_dir_cls.weight: copying a param with shape torch.Size([12, 384, 1, 1]) from checkpoint, the shape in current model is torch.Size([4, 384, 1, 1]). size mismatch for bbox_head.conv_dir_cls.bias: copying a param with shape torch.Size([12]) from checkpoint, the shape in current model is torch.Size([4]). /home/liyf/anaconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/visualization/visualizer.py:196: UserWarning: Failed to add <class 'mmengine.visualization.vis_backend.LocalVisBackend'>, please provide the save_dir argument. warnings.warn(f'Failed to add {vis_backend.class}, ' /home/liyf/anaconda3/envs/openmmlab/lib/python3.8/site-packages/torch/functional.py:478: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:2894.) return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]

From the loss curve, it seems that there is no problem in the training process of the model, but my test result is obviously wrong. May I ask how to solve this problem?

deyang2000 commented 1 year ago

I see. Where did I go wrong

deyang2000 commented 1 year ago

3bb29693f7f2b99b6355f59db047c96 all going well!