Reproduction issue for Table 5

nowsyn commented 1 year ago

Hi, I have a reproduction issue for Table 5. In the paper, the LSeg with ViT-B/32 backbone achieves 79.7 pixAcc and 37.8 mIoU. However, I only get 78.9 pixAcc and 33.7 mIoU by using the released code. The reproduced pixAcc/mIoU are not as expected.

Our reproduction command is as follows on 8 GPU cards.

python -u train_lseg.py --dataset ade20k --data_path datasets --batch_size 4 --exp_name lseg_ade20k_b32_240e --base_lr 0.004 --weight_decay 1e-4 --no-scaleinv --max_epochs 240 --widehead --accumulate_grad_batches 2 --backbone clip_vitb32_384

So what is the reason of the performance gap? I may miss some detail settings.

By the way, I encounter a warning when running the released code. [W reducer.cpp:283] Warning: Grad strides do not match bucket view strides. This may indicate grad was not created according to the gradient layout contract, or that the param's strides changed since DDP was constructed. This is not an error, but may impair performance. grad.sizes() = [512, 768, 1, 1], strides() = [768, 1, 768, 768]

Have ever met this warning? May the performance gap caused by this warning?

Look forward to your reply.

hwanyu112 commented 1 year ago

Hi, I also have a reproduction issue for Table 1 just like https://github.com/isl-org/lang-seg/issues/19#issue-1213501618. After training for 20 epochs, it only gets mIoU = 17.92% versus 52.8% reported in the paper.

So, I wonder whether you have reproduced the zero shot experiments successfully.

Boyiliee commented 1 year ago

Hi @nowsyn and @hwanyu112 ,

Thanks for your interest in LSeg!

Regarding table1, We take early stop and only train the model for 3 or 5 epochs. Training longer will make the results worse due to very little training data (which is different from FSS dataset).

Regarding table 5, I didn't meet this problem, we strictly follow the settings mentioned in the paper.

Hope this helps.

Best, Boyi

isl-org / lang-seg

Reproduction issue for Table 5 #32