mc-lan / ClearCLIP

[ECCV2024] ClearCLIP: Decomposing CLIP Representations for Dense Vision-Language Inference
69 stars 4 forks source link

About reproducing Clip result #2

Closed vanmeruso closed 2 months ago

vanmeruso commented 2 months ago

Hello, thanks for your great work!

I have a question about vanilla clip result on PascalVOC20 (with background) In paper, result show miou 16.2. However, my reproduced result show 10.2 miou with under configuration.

Could you please share the experiment setting for vanila Clip result on PascalVOC20 with background?

base configurations

model = dict( type='ClearCLIPSegmentation', clip_type='CLIP', # 'CLIP', 'BLIP', 'OpenCLIP', 'MetaCLIP', 'ALIP' vit_type= 'ViT-B/16', # 'ViT-B/16', 'ViT-L-14' model_type='vanilla', # 'vanilla', 'MaskCLIP', 'GEM', 'SCLIP', 'ClearCLIP' ignore_residual=False, )

mc-lan commented 2 months ago

For the experiments on the datasets with background, you may additionally need to adjust the parameter _probthd. For CLIP, we set _probthd=0.2 .

vanmeruso commented 2 months ago

Thank for your quickly apply!

Reproduced results okay!