Result not as expected - Githubissues

ShenZheng2000 commented 1 year ago

Hello, authors!

I've used your provided code to train the model, but I'm having trouble reproducing the results on BDD datasets. The achieved average precision (AP) is much lower than what was reported in the paper.

I've already requested a pretrained model in #1. Could you please provide the pretrained model or double-check the current code? Your assistance would be greatly appreciated.

This is my result:

This is the result from the paper:

Update: Even after removing the category 'train' during evaluation, the mAP is still ~3% lower.

mecarill commented 1 year ago

Hello Shen Zheng! Thank you for bringing this up. Our results can be seen below and show the performance as reported.

We have updated the readme to include the trained weights for BDD100K and will update the same for SHIFT soon.

ShenZheng2000 commented 1 year ago

Thanks for your timely response. I have some other questions:

(1) Are you using 3 or 4 GPUs for training? According to the code config file, you are using 4 GPUs. But according to the paper, you are using 3 GPUs. I guess this might cause some discrepencies on the mAP scores.

(2) Could you provide the groundtruth coco label? I wonder if my category ids used for bdd2coco conversion are correct:

        attr_dict["categories"] = [
            {"supercategory": "none", "id": 1, "name": "person"},
            {"supercategory": "none", "id": 2, "name": "rider"},
            {"supercategory": "none", "id": 3, "name": "car"},
            {"supercategory": "none", "id": 4, "name": "truck"},
            {"supercategory": "none", "id": 5, "name": "bus"},
            {"supercategory": "none", "id": 6, "name": "train"},
            {"supercategory": "none", "id": 7, "name": "motor"},
            {"supercategory": "none", "id": 8, "name": "bike"},
            {"supercategory": "none", "id": 9, "name": "traffic light"},
            {"supercategory": "none", "id": 10, "name": "traffic sign"},
        ]

(3) Could you check if my training and validation image numbers for BDD day and night is correct?

train_clear_daytime: 12454
val_clear_daytime: 1764
train_clear_night: 22884
val_clear_night: 3274

Thanks again for your patience, and looking forward to your reply!

mecarill commented 1 year ago

(1) Within our paper we use 3 GPUs mainly due to GPU availability. The start-up command is a placeholder for the user to set values, but we will adjust it for clarity. Thanks for your input on this!

(2) We have updated the readme to include our splits under Dataset Download.

(3) Our numbers are higher across the splits. We did not differentiate between clear and other weather conditions and only split between night/day. This is likely the reason for the discrepancy that we have in the evaluation.

train_night.json: 32998 train_day.json: 36728 val_night.json: 4707

ShenZheng2000 commented 1 year ago

I appreciate your thorough response! Utilizing the provided checkpoints and JSON files, I successfully reproduced the results described in the paper for the BDD dataset.

However, when I attempt to train the model from scratch, I encounter an unexpected issue where the bounding box mAPs drop to zero during the second evaluation. This situation puzzles me, as the same set of images and the same JSON file are being used for both the first and the second evaluations. Could you provide some clarification on this matter?

ShenZheng2000 commented 1 year ago

I experimented with various torch versions, different dataset splits, and smaller learning rates in an attempt to train the model. However, none of these attempts successfully resolved the issue at hand. On certain occasions, the mean Average Precision (mAP) would turn out to be zero, while at other times, it would become NaN.

Could you help me on that?

Thanks!

mecarill commented 1 year ago

Is this during the burn-up stage? (Before 50k iterations).

If so the teacher has not been initialised yet resulting in 0 AP.

ShenZheng2000 commented 1 year ago

Yes. It is during the burn-up stage. The mAP after the burn-up stage is good. I will close this issue now.

mecarill / 2pcnet

Result not as expected #5