zwq456 / CLIP-VIS

[IEEE TCSVT] Official Pytorch Implementation of CLIP-VIS: Adapting CLIP for Open-Vocabulary Video Instance Segmentation.
Apache License 2.0
35 stars 2 forks source link

The reproduced result is very low #3

Closed shihao1895 closed 4 months ago

shihao1895 commented 4 months ago

Thank you for your work. When I train in lvis dataset, and test in lvvis_val dataset, the test results on lvvis_val are very low, with an AP of only 0.04. I haven't modified any code, why is that? Looking forward to your early reply.

SCYF123 commented 4 months ago

Hi, shihao. I'm glad you're interested in my project. Could you provide the training and testing logs?

shihao1895 commented 4 months ago

Hello, I am very glad that you could reply to me so quickly. The logs, model, and test_results are available in here: https://cloud.tsinghua.edu.cn/d/ba50137812d64e64bd5c/

Thank you.

SCYF123 commented 4 months ago

Hi, I found that you set DEC_LAYERS to 10 for testing on the lvvis dataset, but it's set to 7 during training. Please set MODEL.MASK_FORMER.DEC_LAYERS 7 in the command line. If the results remain low, I will continue to look for other issues.

shihao1895 commented 4 months ago

Hi, my problem has been solved, thank you very much!