silicx / LoRS_Distill

Code for our ICML'24 on multimodal dataset distillation
BSD 3-Clause "New" or "Revised" License
26 stars 2 forks source link

the learnable similarity metrix #2

Closed zhangxin-xd closed 2 months ago

zhangxin-xd commented 2 months ago

Hi, thanks for sharing this amazing work.

I have a question regarding the learnable similarity matrix S. I’m curious about the decision to make it learnable. Considering that we can easily compute the similarity between cross-modal items after generating the synthesis image and text,

Looking forward to your reply.

zhangxin-xd commented 2 months ago

I have noted some ablation study about the similarity matrix. Could you provide some insights why learnable one performs better than post-calculated one.

Thanks.

silicx commented 2 months ago

Hi! That's a good question. Besides the reasons mentioned in the ablation, I think the learnable similarity also helps the distillation process of image and text. This "soft" similarity is a more flexible and precise metric than the identity similarity and could more accurately guide the alignment of image and text.

(Plus, i'm not a fan of using pretrained model in DD, as it seems not very fair.)

zhangxin-xd commented 2 months ago

That makes sense, thank you for the response!