Reproducing results in table 1

shyam671 / Mask2Anomaly-Unmasking-Anomalies-in-Road-Scene-Segmentation

[ICCV'23 Oral] Unmasking Anomalies in Road-Scene Segmentation

46 stars 8 forks source link

Reproducing results in table 1 #1

Closed pdejorge closed 11 months ago

pdejorge commented 12 months ago

Dear authors,

Congratulations on this very interesting work! I would like to reproduce the experiments in table 1 of the paper as a baseline for our work, however, after following the code in this repo to train M2A on Cityscapes and then finetune with COCO images I get different results. On the datasets evaluated in Table 1.

For instance, I get:

Lost and Found -> AUPRC: 51.95 FPR@95: 35.22 RoadAnomaly -> AUPRC: 75.47 FPR@95: 18.47 RoadObstacle21 -> AUPRC: 28.48 FPR@95: 1.19 RoadAnomaly21 -> AUPRC: 87.33 FPR@95: 17.49 Fishyscapes -> AUPRC: 80.02 FPR@95: 28.56

From the code I get that there is no fixed seed during training and finetuning? Is that correct? If so, did you ever experiment with different runs? Perhaps there is some "trick" that should be done during evaluation that I missed? I just ran anomaly_utils/anomaly_inference.py as specified in the README. I also downloaded the datasets from the link provided to make sure they were the same.

Thanks again for your help. Pau

shyam671 commented 11 months ago

Dear Pau, Thank you for pointing out as there were some missing details.

I have updated the "https://github.com/shyam671/Mask2Anomaly-Unmasking-Anomalies-in-Road-Scene-Segmentation/blob/main/configs/cityscapes/semantic-segmentation/anomaly_train.yaml" could you please Try again.

Yes, during training I think we didn't have fixed seed and mostly followed mask2former way of training. However, at inference we made sure to have fixed seed for fair comparesion.

Regards, Shyam Nandan Rai

pdejorge commented 11 months ago

Dear Shyam,

Thank you for your response!

1) So I understand the main issue is that the R50 backbone used by default in Mask2Former: detectron2://ImageNetPretrained/torchvision/R-50.pkl is not what you used and instead, you downloaded it from https://dl.fbaipublicfiles.com/barlowtwins/ljng/resnet50.pth?

2) Regarding the random seed during inference, from the code in anomaly_utils/anomaly_inference.py I do not see any random component. Does the model have some randomness during inference that requires fixing the random seed? Maybe I missed something.

3) Finally, I observed that in line 133 of anomaly_utils/anomaly_inference.py you compute and augmentation of the image img_ud = np.flipud(img) but then you do not use it but you use the original image and another augmentation left-right. I just wanted to make sure this code is the final version used in the paper.

Thanks again, Pau

shyam671 commented 11 months ago

Dear Pau,

Yes, correct.
Yes, there are no random component. However, it’s a general practise I follow.
You are right I don’t use that. Thanks for pointing out.

Please, feel free to ping again!! -Shyam

pdejorge commented 11 months ago

Hello Shyam,

I tried again with the checkpoint you mentioned: https://dl.fbaipublicfiles.com/barlowtwins/ljng/resnet50.pth and got actually worse results in most settings... would it be possible for you to provide the checkpoint of M2F after Cityscapes training but before OOD finetuning? This would probably help a lot in comparing results.

To clarify the checkpoint I ask for would be /home/shyam/Mask2Former/output/xl-fl/bt-f-xl.pth as specified in this line

Thanks in advance.

shyam671 commented 11 months ago

Hi,

I have attached the link for the model along with ood_dataset(MS-coco processed subset). Link: https://drive.google.com/drive/folders/1RRA_0eU2KZMNFGOTvSAdHaFl5B6L_MO8?usp=share_link

-Shyam

pdejorge commented 11 months ago

Thank you! I'll try to do only the finetuning part from that checkpoint.

zhouhuan-hust commented 7 months ago

Hello Pau and Shyam, have you reproduced similar results to the paper? I have a question. For contrast loss, both normal and abnormal l_N are negative. The paper says that under ideal circumstances, normal l_N is -1 and abnormal is 0. Why does the formula for contrast loss seem to be to make normal l_N as close to 0 as possible and abnormal as much as possible greater than m(that is, 0.75), that is, abnormal is also as close to 0 as possible? Thank you very much!