Large AP difference of KITTI validation set and test set

chenx1e commented 1 year ago

Prerequisite

[x] I have searched Issues and Discussions but cannot get the expected help.
[X] I have read the FAQ documentation but cannot get the expected help.
[ ] The bug has not been fixed in the latest version (dev) or latest version (1.x).

Task

I'm using the official example scripts/configs for the officially supported tasks/models/datasets.

Branch

master branch https://github.com/open-mmlab/mmdetection3d

Environment

Not about the environment, running successfully.

Reproduces the problem - code sample

I've run the SECOND model as a baseline in my experiment using the official config file with KITTI 3 classes for 160 epochs and batch size of 2. The config files I'm using are:

Model: configs/base/models/hv_second_secfpn_kitti.py
Dataset: configs/base/datasets/kitti-3d-3class.py
Schedules: configs/base/schedules/cyclic_40e.py
Runtime: configs/base/default_runtime.py

To unify the training settings to validation and compare my improved model and the SECOND official baseline. I made a small change to the SECOND config files above, and it hardly affected the results. Specifically:

In dataset config: (1) I have changed samples_per_gpu=2 (2) I delete the ObjectNoise in train_pipeline.
In schedules config: (1) I changed max epochs to 80. runner = dict(type='EpochBasedRunner', max_epochs=80) Other than that, it is all consistent with the official model.

Now I get validation results of validation set from mmdetection3D (the train val split 1:1 is following the official), it is pretty good AP for the easy, moderate and hard difficulty level. From the training log file, the AP of cars and pedestrians are (I assume that the AP40 strict metric is following the KITTI official evaluation metric)

KITTI/Car_3D_AP40_easy_strict: 88.7487, 
KITTI/Car_3D_AP40_moderate_strict: 78.8525, 
KITTI/Car_3D_AP40_hard_strict: 76.0954,

KITTI/Pedestrian_3D_AP40_easy_strict: 58.4486, 
KITTI/Pedestrian_3D_AP40_moderate_strict:51.1867, 
KITTI/Pedestrian_3D_AP40_hard_strict: 45.9806

However, when I using the checkpoint file to predict the KITTI test set and upload the prediction files to the KITTI test server and get results, the AP 3D is extremely lower than the results above. For the Car (3D Detection), it's only 83.44%, 74.68 % and 67.83% for the three difficulty levels. For pedestrian (3D Detection, it's only 47.15%, 36.19% and 33.42%). Please, could anyone tell me why?

To transform the prediction files in mmdetection3D (pkl file) to the text files for upload to KITTI test server, first, I using the command to get the predict pkl file: python tools/test.py ./configs/mymodel/kitti_3class.py ./results/second/epoch_80.pth --format-only --eval-options 'pklfile_prefix=./second_test' , where the config file is modified above, and checkpoint file is from training the config file.

Second, I write a python file to extract the predict results in pkl file to txt file.

import os
import pickle
import numpy as np

with open('./second_test.pkl', 'rb') as f:
    data = pickle.load(f)
    # write every prediction to a text file.
    for i in range(len(data)):
        predict = data[i]
        arr = np.empty([len(predict['name']), 13])  # Initial arr to save object information.
        filename = str(i).zfill(6)
        save_filename = os.path.join('D:/dataset/KITTI/object/testing/submit2', filename + '.txt')
        # change object dimension to height,width,length.
        predict['dimensions'][:, [0, 1, 2]] = predict['dimensions'][:, [1, 2, 0]]
        for j in range(len(predict['name'])):
            arr[j, 0] = predict['alpha'][j]  # 1
            arr[j, 1:5] = predict['bbox'][j]  # 4
            arr[j, 5:8] = predict['dimensions'][j]  # 3
            arr[j, 8:11] = predict['location'][j]  # 3
            arr[j, 11] = predict['rotation_y'][j]  # 1
            arr[j, 12] = predict['score'][j]  # 1

        # keep the object with scores>0.3
        keep_idx = np.where(predict['score'] > 0.3)[0]
        arr = arr[keep_idx]
        predict['name'] = predict['name'][keep_idx]

        np.savetxt(save_filename, arr, fmt="%.2f")

        # Adding name, truncated, occluded to the file.
        with open(save_filename) as file:
            lines = file.readlines()

        file.close()

        for k in range(len(lines)):
            lines[k] = predict['name'][k] + ' -1' + ' -1 ' + lines[k]  # set truncated, occluded to -1

        with open(save_filename, 'w') as txt_file:
            for line in lines:
                txt_file.write(line)
        txt_file.close()

        print(i, "; ", i / 7518 * 100, "%")

    f.close()

The text file ready to submit KITTI test server is like

I offered everything I could, please, could anyone help me.

Reproduces the problem - command or script

None

Reproduces the problem - error message

None

Additional information

No response

ZCMax commented 1 year ago

We usually train on train-val set and test on test dataset to enhance performance, the checkpoint we provided is only trained on training set

chenx1e commented 1 year ago

@ZCMax Thanks for your reply. I'll try to train the whole dataset 👍

chenx1e commented 1 year ago

@ZCMax Excuse me again, I'm not sure whether the AP40 strict metric for the three difficulty levels is following the KITTI official evaluation metric, since I initially guess the performance gap between validation and test set is due to the different metric. If you know, please tell me, thanks.

Tai-Wang commented 1 year ago

AP40 is the latest evaluation metric used in the KITTI official benchmark, so there is no gap here. Besides, there can be some other tricks like different train-val split for cross-validation, etc., to achieve better performance on the test set, but it depends on different paper authors.

open-mmlab / mmdetection3d