gordonhu608 / MQT-LLaVA

Matryoshka Query Transformer for Large Vision-Language Models
Apache License 2.0
73 stars 11 forks source link

How's the model perforamnce compare with Resampler #1

Open lucasjinreal opened 3 weeks ago

lucasjinreal commented 3 weeks ago

Hi, looks like MQT training still with a Maixum token num say 256, and then inference can choose any tokens num. But how does it compar with a Resampler, training & inference with only say 64 tokens?

gordonhu608 commented 2 weeks ago

Thank you for your interest in our work. In our paper's Figure 1, we have a baseline model both training & inference with 64 tokens. Given both query transformer and Resampler employ similar cross-attention modules, I believe you can use this number for some reference only.