Open mzamini92 opened 1 day ago
Thank you for your interest in our work and your feedback ^_^
CCA is technically compatible to LVLMs other than LLaVA in this repo, such as LLaVA-UHD and EAGLE. Some modifications need to be made to accomodate for change in visual feature resolution.
Our method was implemented based on LLaVA, where length of visual token sequence is 576 (manually set here). 2-D visual features from this sequence will be in 24 by 24 (set here). Length of visual token sequence is subject to LVLM model design, hence slight modifications should be made when adapting CCA to other models.
We suspect this could be a possible reason. If this is the case, you may adjust the IMG_TOKEN_LEN
, H
and W
according to the visual feature size in your experimental setup.
Let us know if you have further questions on this. Thanks.
Hi. Thanks for the great work. I tried to prepend and just add the
to LLaVA-UHD or EAGLE and I get:
When I also modify the llava_llama.py file the same as yours, I get:
did I miss anything?