isl-org / lang-seg

Language-Driven Semantic Segmentation
MIT License
722 stars 91 forks source link

Hard to reproduce the zero-shot results on COCO dataset. #40

Open Harry-zzh opened 1 year ago

Harry-zzh commented 1 year ago

Hi, could you please provide the range of the learning rate, or other hyper-parameter settings for the zero-shot experiments on the COCO-20i dataset? It is difficult to reproduce the results shown in the paper. I use ViT-L/16 as backbone, and the results are 10 points lower than yours.

goodstudent9 commented 7 months ago

Hi, could you please provide the range of the learning rate, or other hyper-parameter settings for the zero-shot experiments on the COCO-20i dataset? It is difficult to reproduce the results shown in the paper. I use ViT-L/16 as backbone, and the results are 10 points lower than yours.

Hello, I wonder why zero-shot needs to be trained on COCO dataset? I mean, in my mind, zero shot means directly use the ADE20K trained model to test on COCO dataset. Actually, I don't understand why there is a lot of files with postfix _zs. Because that seems like I need to train the model again, and the architecture is different from origin model. And that is not zero shot I think.

Do you have any idea about this? Thank you!

Harry-zzh commented 7 months ago

Hi, could you please provide the range of the learning rate, or other hyper-parameter settings for the zero-shot experiments on the COCO-20i dataset? It is difficult to reproduce the results shown in the paper. I use ViT-L/16 as backbone, and the results are 10 points lower than yours.

Hello, I wonder why zero-shot needs to be trained on COCO dataset? I mean, in my mind, zero shot means directly use the ADE20K trained model to test on COCO dataset. Actually, I don't understand why there is a lot of files with postfix _zs. Because that seems like I need to train the model again, and the architecture is different from origin model. And that is not zero shot I think.

Do you have any idea about this? Thank you!

Hi, I think what you mentioned is one form of zero-shot setting. In lang-seg paper, they use another zero-shot setting where labels that are used for inference have never been seen during training. For example, the model is trained on COCO-20i dataset, where the ground truth categories used in training and inference are different.

As for other details of this repository, it has been a long time since I last used it, so I couldn't remember the details clearly.