kyutai-labs / moshi

Apache License 2.0
6.74k stars 523 forks source link

The data for training Mimi #147

Open zruiii opened 2 weeks ago

zruiii commented 2 weeks ago

Due diligence

Topic

The paper

Question

It seems that Mimi was trained independently of Moshi, but I couldn’t find the dataset used to train Mimi. Did I miss something?

soyyosusu commented 2 days ago

Hi, thanks for the great work Moshi team! I also have the same question, are the data used to train Mimi similar to the 7 million hours used to pre-train Moshi, or similar to data used to train encodec or soundstream? @LaurentMazare @adefossez

Thanks!