hubertsiuzdak / snac

Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate
https://hubertsiuzdak.github.io/snac/
MIT License
443 stars 26 forks source link

About VQ and the datasets #14

Open jiaweiru opened 6 months ago

jiaweiru commented 6 months ago

Hi, thanks a lot for the great work, the use of tokens with different resolutions is in line with the intuitive understanding of the audio signal (like a representation of steady state features and transient features). Here I have a question: doesn't the use of coarse tokens lead to longer latency, because lower sampling frequency tokens need to read in more buffer information.

Also, can I know on which datasets the open pre-training models are trained? Much appreciated.