jishengpeng / WavTokenizer

SOTA discrete acoustic codec models with 40 tokens per second for audio language modeling
MIT License
650 stars 34 forks source link

Release of the bigger models :) #1

Open christophschuhmann opened 2 weeks ago

christophschuhmann commented 2 weeks ago

Hey, I am Christoph one of the co-founders of LAION.

We are working on open source Models like gpt4o and a looking for a better Audio Codec than Snac, which has some problems with very expressive speech and out of distribution sound and music.

Therefore we are very happy and hopeful about your release! :)

Can you give us some timeline for the release of the bigger models?

If you like we can even collaborate on training even better models on more data, we have lots of data. :)

jishengpeng commented 2 weeks ago

Hey, I am Christoph one of the co-founders of LAION.

We are working on open source Models like gpt4o and a looking for a better Audio Codec than Snac, which has some problems with very expressive speech and out of distribution sound and music.

Therefore we are very happy and hopeful about your release! :)

Can you give us some timeline for the release of the bigger models?

If you like we can even collaborate on training even better models on more data, we have lots of data. :)

Thank you very much for your interest!

Prior to our submission on October 2nd, we will release both the medium and large checkpoints. Due to limited computational resources, we are constrained to training each version using only 8 A800 GPUs, which results in a slower training process.

Our training code is open-source, and we invite you to use larger datasets for training. We highly encourage the community to release more powerful checkpoints with more training data.

Should you encounter any issues during the training or inference processes, please feel free to reach out. We are committed to responding promptly and resolving any problems to the best of our ability.

Best regards.