open-mmlab / mmdetection3d

OpenMMLab's next-generation platform for general 3D object detection.
https://mmdetection3d.readthedocs.io/en/latest/
Apache License 2.0
5.16k stars 1.52k forks source link

Gap w/ OpenPCDet for SECOND and PointPillars #648

Open Divadi opened 3 years ago

Divadi commented 3 years ago

Comparing Car moderate 3D strict results from this repository with that of OpenPCDet, there seems to be a 2~3 point difference: SECOND: 76.91425 (here) vs 78.62 (OpenPCDet) PointPillars: 73.88344 (here) vs 77.28 (OpenPCDet)

I was wondering if anyone had some clues to why this difference might occur. Is it the BEV NMS used in this repository vs 3D rotated NMS in OpenPCDet perhaps?

yezhen17 commented 3 years ago

Hi @Divadi , sorry for the late reply. I'm not sure about the reason behind the performance gap, yet the difference between NMS may be possible for causing the gap.

zhanggefan commented 2 years ago

Hi, @Divadi @THU17cyz,

I have been watching this thread for weeks, still no progress? Maybe we could share the information here if you find any potential reason or anything suspicious.

I did a simple comparison between the training configs of PointPillars on kitti used by the 2 codebases and found these differences:

  1. Batch size. OpenPCDet uses 4 samples 8 while here MMDet3D uses 6 samples 8
  2. GT object sampling augment. OpenPCDet uses the ground plane to adjust the sampled objects' z location. Besides, for 3 classes detection task, the objects' sampling counts are also different.
  3. Object noise augment. I find no object noise augment in OpenPCDet. BTW, this augmentation seems troublesome and the augmented point cloud has some weird representations like pedestrians floating in the air or vehicles driving under the ground plane.

I am currently doing experiments to align the data pipeline but it is really laborious. I hope somebody could give me a hand.

yezhen17 commented 2 years ago

Hi @zhanggefan. Thank you very much for sharing your discoveries! Recently we have been busy with other lines of work, so we haven't looked into this issue yet. From my point of view, the use of ground planes should boost the performance. Maybe @Tai-Wang can take a look?

zhanggefan commented 2 years ago

@THU17cyz,

I just finished a round of training and takes the ground plane into account during gt sampling (messy experimental code and not ready for PR so please don't ask for it here, hahaha). What surprises me is that the AP score even drops...

But this is not the weirdest.

I switched to the refactored coordinates branch today, and from #913 I know that there is a performance gap of around 1% compared to the master branch.

In one of the experiments (Pointpillars on Kitti 3 classes) I did today I accidentally swapped all of the w and h of the anchor settings, but it improves the final AP score by around 1%. Especially for cyclist class, the AP increases by 4%. I used the training schedule of 12 * 4 with 3090s and got an average AP score of 61.5% (overall moderate strict).

I guess there is still something wrong with the new branch.

Training logs are attached here for your information:

swapped w&h: mAP 61.5 20210908_122709.log

correct one: mAP 60.5 20210908_153654.log

yezhen17 commented 2 years ago

Oh you're really fast lol. I'm not familiar with the effect of the planes, but I'm responsible for the refactored coordinate system, and your finding is really interesting. May you elaborate more on how you switched w and h, say, provide the fraction of modified config?

yezhen17 commented 2 years ago

By the way, did you generate new db_infos for KITTI

zhanggefan commented 2 years ago

By the way, did you generate new db_infos for KITTI

Of course I did, but after all, I had to modify the data convert script and regenerate files so that I can put the plane information into the pickle files. I checked the data pipeline with the visualizer and everything looked alright.

zhanggefan commented 2 years ago

Oh you're really fast lol. I'm not familiar with the effect of the planes, but I'm responsible for the refactored coordinate system, and your finding is really interesting. May you elaborate more on how you switched w and h, say, provide the fraction of modified config?

The configs are dumped into the head of the log files.

yezhen17 commented 2 years ago

Hmmm sorry I missed the logs viewing from my phone. I see. However I remember using the original anchor config hurts the performance months ago...Please give me some time to look into this.

zhanggefan commented 2 years ago

@THU17cyz, @Tai-Wang

Here are some more discoveries concerning the performance gap compared to OpenPCDet

  1. The training schedule is different. OpenPCDet trains point pillars for only 80e, but MMDet3D trains for 160e. Whats more, OpenPCDet uses max(lr)==0.003, but here in MMDet3D max(lr)==0.01

  2. The PFE parts have a lot of differences. The input channels count of the first linear layer of PFE is 9 in MMDet3D but 10 in OpenPCDet. Besides, setting the legacy flag of PillarFeatureNet to False boost the mAP score by 1.4. The legacy flag is really dumb because it produces redundant input features to the network. When set legacy flag to True (default value), the input features would always satisfy: features[..., [0,1]] == features[..., [-2,-1]]

Divadi commented 2 years ago

A couple other points I've noticed before: Both: For optimization, the Adam betas are slightly different (0.9, 0.99) for OpenPCDet vs (0.95, 0.99) here. The post-processing is slightly different. iirc, OpenPCDet uses 3D rotated NMS while mm3d uses BEV rotated NMS. Further, NMS pre->post is 4096->500 for OpenPCDet, it's 100->50 here. (I have actually tested this myself before, but results were inconclusive. I think Ped improved but Cyc declined or something along those lines. I felt it was difficult to do a complete comparison because Ped & Cyc varies quite a bit between runs.) For reference, I had ported "iou3d_nms" from OpenPCDet into mm3d and modified the top of "mmdet3d/core/post_processing/box3d_nms.py" to be:

import numba
import numpy as np
import torch

from mmdet3d.ops.iou3d.iou3d_utils import nms_gpu, nms_normal_gpu
from mmdet3d.ops.iou3d_nms import pcdet_nms_gpu

def box3d_multiclass_nms(mlvl_bboxes,
                         mlvl_bboxes_for_nms,
                         mlvl_scores,
                         score_thr,
                         max_num,
                         cfg,
                         mlvl_dir_scores=None,
                         mlvl_attr_scores=None,
                         mlvl_bboxes2d=None):
    """Multi-class nms for 3D boxes.
    Args:
        mlvl_bboxes (torch.Tensor): Multi-level boxes with shape (N, M).
            M is the dimensions of boxes.
        mlvl_bboxes_for_nms (torch.Tensor): Multi-level boxes with shape
            (N, 5) ([x1, y1, x2, y2, ry]). N is the number of boxes.
        mlvl_scores (torch.Tensor): Multi-level boxes with shape
            (N, C + 1). N is the number of boxes. C is the number of classes.
        score_thr (float): Score thredhold to filter boxes with low
            confidence.
        max_num (int): Maximum number of boxes will be kept.
        cfg (dict): Configuration dict of NMS.
        mlvl_dir_scores (torch.Tensor, optional): Multi-level scores
            of direction classifier. Defaults to None.
        mlvl_attr_scores (torch.Tensor, optional): Multi-level scores
            of attribute classifier. Defaults to None.
        mlvl_bboxes2d (torch.Tensor, optional): Multi-level 2D bounding
            boxes. Defaults to None.
    Returns:
        tuple[torch.Tensor]: Return results after nms, including 3D \
            bounding boxes, scores, labels, direction scores, attribute \
            scores (optional) and 2D bounding boxes (optional).
    """
    # added new!
    if cfg.get('use_pcdet_nms_gpu', False):
        # This is 3D, rotated, class agnostic.
        # assumes that non-sigmoid, so ignore the final background prob
        max_class_labels = mlvl_scores[:, :-1].argmax(dim=1)
        mlvl_max_class_scores = mlvl_scores[torch.arange(mlvl_scores.shape[0]), max_class_labels]
        nms_selected, _ = pcdet_nms_gpu(mlvl_bboxes, mlvl_max_class_scores, score_thr)
        bboxes = mlvl_bboxes[nms_selected]
        scores = mlvl_max_class_scores[nms_selected]
        labels = max_class_labels[nms_selected]

        if mlvl_dir_scores is not None:
            mlvl_dir_scores = mlvl_dir_scores[nms_selected]
        if mlvl_attr_scores is not None:
            mlvl_attr_scores = mlvl_attr_scores[nms_selected]
        if mlvl_bboxes2d is not None:
            mlvl_bboxes2d = mlvl_bboxes2d[nms_selected]

        if bboxes.shape[0] > max_num:
            _, inds = scores.sort(descending=True)
            inds = inds[:max_num]
            bboxes = bboxes[inds, :]
            labels = labels[inds]
            scores = scores[inds]

            if mlvl_dir_scores is not None:
                mlvl_dir_scores = mlvl_dir_scores[inds]
            if mlvl_attr_scores is not None:
                mlvl_attr_scores = mlvl_attr_scores[inds]
            if mlvl_bboxes2d is not None:
                mlvl_bboxes2d = mlvl_bboxes2d[inds]

        results = (bboxes, scores, labels)
        if mlvl_dir_scores is not None:
            results = results + (mlvl_dir_scores, )
        if mlvl_attr_scores is not None:
            results = results + (mlvl_attr_scores, )
        if mlvl_bboxes2d is not None:
            results = results + (mlvl_bboxes2d, )

        return results

Other differences: PointPillars: OpenPCDet uses grad norm clip 10, this repository uses 35

SECOND: IoU Matching for Ped & Cyc are different - it's 0.2 ~ 0.35 here. For OpenPCDet it's 0.45 ~ 0.6

I'll keep posting as I remember/find things

Divadi commented 2 years ago

@zhanggefan Did you try an experiment without object noise?

zhanggefan commented 2 years ago

@Divadi I tried it today but found no improvement.

But here is the good news

The config for the anchor generator is absolutely wrong!!! The centers of the generated anchors turn out to have wrong stride! And I believe it is possibly a day zero bug. @THU17cyz @Tai-Wang

https://github.com/open-mmlab/mmdetection3d/blob/2efdb6b6d94fb7ce944e83f743b98b1563e96fee/configs/_base_/models/hv_pointpillars_secfpn_kitti.py#L40-L42

zhanggefan commented 2 years ago

I just started an experiment to examine how much impact this bug has on the result.

The correct config should be

ranges=[
    [0.16, -39.52, -0.6, 68.96, 39.52, -0.6],
    [0.16, -39.52, -0.6, 68.96, 39.52, -0.6],
    [0.16, -39.52, -1.78, 68.96, 39.52, -1.78],
]

It shows that for ranged anchors the anchors' center bias w.r.t the feature map positions is up to 1.44 meters.

It must have introduced some sort of feature-prior misalignment and thus hurts the feature extractor's performance, and influences the anchor-gt assignment as well.

yezhen17 commented 2 years ago

Wow, great observations @Divadi , @zhanggefan ! I'm not very familiar with the anchors, and now I have enrolled in graduate school and cannot maintain MMDet3D like before : ( But I will still keep an eye on this thread and add my opinions!

zhanggefan commented 2 years ago

Update:

  1. Best score of pointpillars on kitti yet:

Ped@50.9336 Cyclist@63.5687 Car@76.0103 Overall@63.5042

  1. New discovery:

The next experiment I am planning to do is aligning the behavior of the assigners by setting assign_per_class to true in the anchor head's configuration.

Divadi commented 2 years ago

With reference to the anchor offset - then does this issue exist over the entire repository? I notice now that OpenPCDet actually had a similar problem "align_center" https://github.com/open-mmlab/OpenPCDet/blob/c9d31d393acaae34d74cb03bd6ccff9976d3d1f3/pcdet/models/dense_heads/target_assigner/anchor_generator.py#L28 However, in their configs, for backward compatibility, they leave it as false https://github.com/open-mmlab/OpenPCDet/blob/c9d31d393acaae34d74cb03bd6ccff9976d3d1f3/tools/cfgs/kitti_models/pointpillar.yaml#L87 https://github.com/open-mmlab/OpenPCDet/issues/272 So even without this tweak, there is a performance gap?

zhanggefan commented 2 years ago

@Divadi The anchor config error exists only in the https://github.com/open-mmlab/mmdetection3d/blob/2efdb6b6d94fb7ce944e83f743b98b1563e96fee/configs/_base_/models/hv_pointpillars_secfpn_kitti.py

The error only affects the results of pointpillars on kitti, including both 3 classes and car-only results.

https://github.com/open-mmlab/mmdetection3d/blob/2efdb6b6d94fb7ce944e83f743b98b1563e96fee/configs/_base_/models/hv_pointpillars_secfpn_kitti.py#L40-L42 It is not just the center misalignment by half feature map stride. The value 70.4 is simply wrong. I believe the author just copied and pasted the existing configs from SECOND and forgot to adjust some of the values according to the different voxelization range for pointpillars. The voxelization range for SECOND is [0, -40, -3, 70.4, 40, 1], different from the range [0, -39.68, -3, 69.12, 39.68, 1] for pointpillars.

The OpenPCDet's config does introduce center misalignment as well, but the maximum misalignment bias is only 0.16 m, which is way smaller than the bias of 1.44 m we have here in MMDet3D.

I just finished the experiment of "assignment behavior aligning" and the gap is finally gone. The best result of pointpillars on kitti now is:

Ped@50.3822 Cyclist@64.6297 Car@77.2545 Overall@64.0888

20210912_053007.log

Divadi commented 2 years ago

Ah, I understand - it's the 69.12 vs 70.4 for PointPillars

Tai-Wang commented 2 years ago

Hi @zhanggefan @Divadi , awesome study! Sorry for the little late reply. The overall study has been really exhaustive, so let me conclude and share some experiences.

zhanggefan commented 2 years ago

Hi, @Tai-Wang Any progress? Looking forward to the release of new baseline scores!

Tai-Wang commented 2 years ago

Hi, @zhanggefan We are asking our colleagues to help follow up and update the models and benchmarks (I am busy with the code release of our latest monocular 3D detector and my own research recently), but it cannot be very quick because we do not have enough resources (as much as in the previous 3 months) to deal with all the issues. The good news is that this is one of the issues that are given the highest priority. We will post the results here ASAP if there are any.

Divadi commented 2 years ago

Another point I just found was that for cut & paste augmentation, the "filter by difficulty" does nothing for KITTI. https://github.com/open-mmlab/mmdetection3d/blob/75c5c6cd590386bd1539a686c5fd2cc45c5480d5/tools/data_converter/create_gt_database.py#L251 Is where difficulty is stored in the GT database. However, https://github.com/open-mmlab/mmdetection3d/blob/75c5c6cd590386bd1539a686c5fd2cc45c5480d5/mmdet3d/datasets/kitti_dataset.py#L139 in kitti_dataset does not store "difficulty" in the returned dict, so all objects are given difficulty "0", hence nothing being filtered for cut and paste. This is unlike openpcdet where objects with difficulty "-1" are not used for cut & paste.

yezhen17 commented 2 years ago

@THU17cyz,

I just finished a round of training and takes the ground plane into account during gt sampling (messy experimental code and not ready for PR so please don't ask for it here, hahaha). What surprises me is that the AP score even drops...

But this is not the weirdest.

I switched to the refactored coordinates branch today, and from #913 I know that there is a performance gap of around 1% compared to the master branch.

In one of the experiments (Pointpillars on Kitti 3 classes) I did today I accidentally swapped all of the w and h of the anchor settings, but it improves the final AP score by around 1%. Especially for cyclist class, the AP increases by 4%. I used the training schedule of 12 * 4 with 3090s and got an average AP score of 61.5% (overall moderate strict).

I guess there is still something wrong with the new branch.

Training logs are attached here for your information:

swapped w&h: mAP 61.5 20210908_122709.log

correct one: mAP 60.5 20210908_153654.log

Hi @zhanggefan , I went through the refactoring again and it seems the anchors should be [l, w, h] instead of [w, l, h] logically... Maybe we can ask someone to work on this and figure it out.

zhanggefan commented 2 years ago

@THU17cyz, I just finished a round of training and takes the ground plane into account during gt sampling (messy experimental code and not ready for PR so please don't ask for it here, hahaha). What surprises me is that the AP score even drops...

But this is not the weirdest.

I switched to the refactored coordinates branch today, and from #913 I know that there is a performance gap of around 1% compared to the master branch. In one of the experiments (Pointpillars on Kitti 3 classes) I did today I accidentally swapped all of the w and h of the anchor settings, but it improves the final AP score by around 1%. Especially for cyclist class, the AP increases by 4%. I used the training schedule of 12 * 4 with 3090s and got an average AP score of 61.5% (overall moderate strict). I guess there is still something wrong with the new branch. Training logs are attached here for your information: swapped w&h: mAP 61.5 20210908_122709.log correct one: mAP 60.5 20210908_153654.log

Hi @zhanggefan , I went through the refactoring again and it seems the anchors should be [l, w, h] instead of [w, l, h] logically... Maybe we can ask someone to work on this and figure it out.

Yep, [l, w, h] is absolutely logically correct. I think the weird score improvement might relate to the config bug. I am not sure either.

But anyway, I think the problem is solved after the config bug is fixed. The best score I got so far with point pillars on Kitti is 64.9 Overall.

BTW, taking the behavior of the stride-conv into account, the best config for anchor range should be:

ranges=[
    [0.08, -39.60, -0.6, 68.88, 39.44, -0.6],
    [0.08, -39.60, -0.6, 68.88, 39.44, -0.6],
    [0.08, -39.60, -1.78, 68.88, 39.44, -1.78],
]
ZCMax commented 2 years ago

@zhanggefan Hello, Did you do these modifications on the dev branch? or on the master branch?

zhanggefan commented 2 years ago

@ZCMax Hi, I did those on the dev branch.

Xiangxu-0103 commented 2 years ago

@ZCMax @zhanggefan Hi, I think the anchor range has a little difference compared to OpenPCDet. The following code is the anchor generate provided by OpenPCDet

# align_center = False
# self.anchor_range = [0, -39.68, -3, 69.12, 39.68, 1]
# grid_size = [216, 248]
# anchor_height = -1.78  # car for example

if align_center:
    x_stride = (self.anchor_range[3] - self.anchor_range[0]) / grid_size[0]
    y_stride = (self.anchor_range[4] - self.anchor_range[1]) / grid_size[1]
    x_offset, y_offset = x_stride / 2, y_stride / 2
else:
    x_stride = (self.anchor_range[3] - self.anchor_range[0]) / grid_size[0]
    y_stride = (self.anchor_range[4] - self.anchor_range[1]) / grid_size[1]
    x_offset, y_offset = 0, 0
x_shifts = torch.arange(self.anchor_range[0] + x_offset, self.anchor_range[3] + 1e-5, step=x_stride, dtype=torch.float32).cuda()
y_shifts = torch.arange(self.anchor_range[1] + y_offset, self.anchor_range[4] + 1e-5, step=y_stride, dtype=torch.float32).cuda()
z_shifts = x_shifts.new_tensor(anchor_height)