isl-org / lang-seg

Language-Driven Semantic Segmentation
MIT License
722 stars 91 forks source link

How to train the zero-shot model? #34

Closed hwanyu112 closed 2 years ago

hwanyu112 commented 2 years ago

Hi! Thanks for your interesting work! I am trying to reproduce the zero-shot experiments in the paper recently, but like https://github.com/isl-org/lang-seg/issues/19#issue-1213501618 , it gets mIoU much lower than yours.

Here is my scripts:

train_lseg_zs.py:

from modules.lseg_module_zs import LSegModuleZS
from utils import do_training, get_default_argument_parser

if __name__ == "__main__":
    parser = LSegModuleZS.add_model_specific_args(get_default_argument_parser())
    args = parser.parse_args()
    do_training(args, LSegModuleZS)

command:

python -u train_lseg_zs.py --backbone clip_resnet101 --exp_name lsegzs_pascal_f0 --dataset pascal \
--widehead --no-scaleinv --arch_option 0 --ignore_index 255 --fold 0 --nshot 0 --batch_size 8 \

Default aruguments: base_lr=0.004, weight_decay=1e-4, momentum=0.9

I wonder where the problem is. And could you please share your training scripts for the zero-shot experiment?

hwanyu112 commented 2 years ago

When I make base_lr=0.09, it gets higher mIoU. So could you please provide your whole hyperparameters and how many epochs for training respectively? Thanks a lot.

Boyiliee commented 2 years ago

Hi @hwanyu112 ,

Thanks so much for your interest in LSeg!

We provide the train script for ADE20k dataset, and you could easily revise it for zero-shot experiments. As for FSS-1000, it should be very easy to reproduce the results. As for COCO and PASCAL datasets, due to very few classes, you need early stop (you should be able to get the optimal results using the models from epoch 0-3) and do a hyper-parameter sweep to find the best learning rate, the optimal lr should be smaller than the lr of FSS-1000.

Hope this helps!

Best, Boyi