图1 - Githubissues

wyhlaowang / LDFusion

[LDFusion] Official implementation for "Infrared and visible Image Fusion with Language-driven Loss in CLIP Embedding Space"

16 stars 2 forks source link

图1 #1

Open songwenhao123 opened 3 months ago

songwenhao123 commented 3 months ago

感谢作者的工作，请问图一中，clip对文本的感知分类结果是如何得到的，希望您可以分享一下

wyhlaowang commented 3 months ago

您好！我们将CLIP用作zero-shot的分类器，得出图1的结果。具体地，将多模态图像（红外、可见光）分别与文本描述（例如， "an infrared image"、"a visible image" 等）计算相似性分数，并由softmax得出最终的分类结果。仓库（https://github.com/openai/CLIP）的Zero-Shot Prediction章节提供了一种实现方式，供参考。

songwenhao123 commented 3 months ago

感谢您的回答！