wyhlaowang / LDFusion

[LDFusion] Official implementation for "Infrared and visible Image Fusion with Language-driven Loss in CLIP Embedding Space"
16 stars 2 forks source link

图1 #1

Open songwenhao123 opened 3 months ago

songwenhao123 commented 3 months ago

感谢作者的工作,请问图一中,clip对文本的感知分类结果是如何得到的,希望您可以分享一下

wyhlaowang commented 3 months ago

您好!我们将CLIP用作zero-shot的分类器,得出图1的结果。具体地,将多模态图像(红外、可见光)分别与文本描述(例如, "an infrared image"、"a visible image" 等)计算相似性分数,并由softmax得出最终的分类结果。仓库(https://github.com/openai/CLIP)的Zero-Shot Prediction章节提供了一种实现方式,供参考。

songwenhao123 commented 3 months ago

感谢您的回答!