关于是否可以做分类任务

shenyunhang / APE

[CVPR 2024] Aligning and Prompting Everything All at Once for Universal Visual Perception

https://arxiv.org/abs/2312.02153

Apache License 2.0

459 stars 28 forks source link

Open easonchan2023 opened 6 months ago

easonchan2023 commented 6 months ago

您好，感谢您的工作，作为一个深度学习新手，我有一些问题：

1.文本特征是如何提取的呢，能否简单解释一下？

2.我可以将图像特征与文本特征融合做多模态分类任务吗？

3.能标明提取图像特征和文本特征的代码段吗，非常感谢！

感谢回答！

shenyunhang commented 6 months ago

easonchan2023 commented 6 months ago

感谢回复！还有几个问题，

请问对于文本特征的编码提取在本文中是只使用了clip和openclip吗？

因为我看到代码里使用了llama2这样的大语言模型，请问实验中您使用过吗？

大语言模型几乎都是decoder-only，我不确定是否能作为特征编码器，您是怎么使用的呢？

非常期待您的回答，打扰您了！！！！

shenyunhang commented 6 months ago

试过不同的文本编码器，包括bert、T5和llama，论文中的结果在ablation study那部分。对于decoder-only的模型，我们直接用最后一层隐藏层输出作为文本特征。

easonchan2023 commented 6 months ago

非常感谢！！！

easonchan2023 commented 6 months ago

目前我用大语言模型提取了文本嵌入，维度是4096，因为我对其降维之后分类效果很差，所以想请教一下对于这种高维特征您是如何做的呢？感谢！

shenyunhang commented 5 months ago

目前没有对嵌入的维度进行过降维的尝试，只有对嵌入的长度做平均。