paperswithlove / papers-we-read

3 stars 0 forks source link

InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD #22

Open runhani opened 7 months ago

runhani commented 7 months ago

https://arxiv.org/pdf/2404.06512.pdf https://github.com/InternLM/InternLM-XComposer

preview

hjeun commented 7 months ago

1. InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition(arxiv)

image

2. Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Models(arxiv)

image

3. A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD(arxiv)

image