issues
search
hasanar1f
/
HiRED
HiRED strategically drops visual tokens in the image encoding stage to improve inference efficiency for High-Resolution Vision-Language Models (e.g., LLaVA-Next) under a fixed token budget.
https://www.arxiv.org/abs/2408.10945
MIT License
9
stars
3
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
pre-trained models.
#1
Han-jiaxin
opened
1 week ago
1