yuyq96 / TextHawk

Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models
51 stars 3 forks source link