Is there any ablation studies on the number of LLM layers inserted between the visual encoder and classifiers?

ziqipang / LM4VisualEncoding

[ICLR 2024 (Spotlight)] "Frozen Transformers in Language Models are Effective Visual Encoder Layers"

https://arxiv.org/abs/2310.12973

MIT License

210 stars 6 forks source link

Is there any ablation studies on the number of LLM layers inserted between the visual encoder and classifiers? #2

Closed valencebond closed 9 months ago

ziqipang commented 9 months ago

@valencebond Thanks for the question! I haven't investigated this in the paper because using multiple LLM transformers to handle visual tokens is beyond the resources we had.