Open jidanhuang opened 1 year ago
Hi Jidanhuang,
I have reviewed the papers on AudioLM and noticed many similarities between the models. I also checked the differences between Bark and nanoGPT in the model.py file: https://www.diffchecker.com/fHJG3qfK/. I am not sure if the architecture of Bark was published; maybe only the trained models are available, such as these: https://huggingface.co/suno/bark/tree/main.
Bark consist three transformer models with attention: text, coarse, fine (small & large), the same way use AudioLM / MusicLM. Looks like them use nanoGPT as base model, but with modifications.
AudioLM Paper: https://arxiv.org/abs/2209.03143 [Submitted on 7 Sep 2022] MusicLM Paper: https://arxiv.org/abs/2301.11325 [Submitted on 26 Jan 2023] Bark was published on Apr 9, 2023 (first commit)
Can we discuss all the details via Messenger?
hi, i found that bark use hybrid vocab but audiolm use cross attention to inject information in transformer. I wonder the difference between these two methods. And is there any more difference between bark and audiolm?