suno-ai / bark

🔊 Text-Prompted Generative Audio Model
MIT License
35.52k stars 4.18k forks source link

difference between bark and audiolm #414

Open jidanhuang opened 1 year ago

jidanhuang commented 1 year ago

hi, i found that bark use hybrid vocab but audiolm use cross attention to inject information in transformer. I wonder the difference between these two methods. And is there any more difference between bark and audiolm?

applehawk commented 4 months ago

Hi Jidanhuang,

I have reviewed the papers on AudioLM and noticed many similarities between the models. I also checked the differences between Bark and nanoGPT in the model.py file: https://www.diffchecker.com/fHJG3qfK/. I am not sure if the architecture of Bark was published; maybe only the trained models are available, such as these: https://huggingface.co/suno/bark/tree/main.

Bark consist three transformer models with attention: text, coarse, fine (small & large), the same way use AudioLM / MusicLM. Looks like them use nanoGPT as base model, but with modifications.

AudioLM Paper: https://arxiv.org/abs/2209.03143 [Submitted on 7 Sep 2022] MusicLM Paper: https://arxiv.org/abs/2301.11325 [Submitted on 26 Jan 2023] Bark was published on Apr 9, 2023 (first commit)

Can we discuss all the details via Messenger?