mlfoundations / MINT-1T

MINT-1T: A one trillion token multimodal interleaved dataset.
770 stars 20 forks source link

This is truly a huge amount of data. #7

Closed limhasic closed 3 months ago

limhasic commented 3 months ago

But what can I do with this?

Are there any examples of its use?

anas-awadalla commented 3 months ago

Hello! The main use case we focus on in the paper is training large multimodal models that reason about both image and text inputs. A few great examples of models that use this type of interleaved data are MM1 and Idefics2.

limhasic commented 3 months ago

thank you