Can not reproduce the result of DenseCLIP-R50.

raoyongming / DenseCLIP

[CVPR 2022] DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting

505 stars 38 forks source link

Can not reproduce the result of DenseCLIP-R50. #18

Closed Richardych closed 2 years ago

Richardych commented 2 years ago

Hi,

I use your introduction of training denseclip-R50, with batch_size of "4 x 8 GPU".

However I can not reproduce your results(43.5) in your paper, I only get 42.8 mIoU.

Can you provide the training log file? or more details (e.g., seed?) to reproduce your paper results. Thanks!

raoyongming commented 2 years ago

Hi,

Thanks for your interest in our work. I have uploaded the training log for your reference. Our experiments are conducted with pytorch 1.10.0, cuda 11.1, and mmseg 0.18.0. We didn't set the random seed so there might be some differences between identical runs.

Richardych commented 2 years ago

@raoyongming Thanks for your reply. My env is torch1.8.1_cuda10.1_mmseg_0.19.0, and I set a seed=0.

I think the "torch1.8.1_cuda10.1_mmseg_0.19.0" shouldn't make (43.5->42.8) degeneration of mIoU, I will try without setting seed.

Richardych commented 2 years ago

@raoyongming Hi,

I find that with random seed setting, I can still not reproduce the paper results: Have you verified the impact of the random seed of DenseCLIP? e.g., train multiple models and verify the robustness?

Look forward to your reply, Thanks! Since DenseCLIP is a great work, and I want to use it as a baseline.

raoyongming commented 2 years ago

I just check the logs of our experiments. It seems DenseCLIP-50 with the setting reported in our paper can generally achieve >43.0 mIoU in multiple runs in our environment. With different context lengths (4-32), DenseCLIP-r50 achieved 42.2-43.5 mIoU on ADE. I also notice the best results on ADE may depend on the last few iterations since the dataset is relatively small. So I think it seems ~43 mIoU should be reasonable considering it has largely outperformed the baseline (39.6 with CLIP+FPN).

Richardych commented 2 years ago

@raoyongming Thanks for the quick reply! Do you mean you got the 43.5 with the context_length of 32?

and according to the following calculation: "context_length = self.text_encoder.context_length - self.context_length"

we get 32 by: "32 = 37 - 5"

am I right?

look for your reply.

raoyongming commented 2 years ago

Sorry for the confusion. The 43.5 mIoU is achieved by the model with a context length of 8 as reported in our paper. I mean the performance of different context lengths is in the range of 42.2-43.5. A longer context may not lead to better performance.

Richardych commented 2 years ago

@raoyongming Thanks for the reply. I tried 4 more different random seeds and got 42.6-43.1 mIoU with DenseCLIP-R50.

raoyongming commented 2 years ago

I am not sure which reason causes the slightly low performance in your experiments. It may be related to the environment (hardware, cuda/pytorch versions, etc.). Since we tune the hyper-parameters based on DenseCLIP-R50, it is also possible that DenseCLIP-R50 will have higher performance than average. Maybe you can use the reproduced results as your baseline since your method and our baseline are evaluated in the same environment?