jishengpeng / WavTokenizer

SOTA discrete acoustic codec models with 40 tokens per second for audio language modeling
MIT License
801 stars 44 forks source link

speech medium v2 #37

Closed theodorblackbird closed 2 months ago

theodorblackbird commented 2 months ago

Hi, Thank you for sharing this work. I've seen a recently uploaded "v2" version for the medium speech model. It seems not backward compatible, how does it differ from the first version ?

jishengpeng commented 2 months ago

Hi, Thank you for sharing this work. I've seen a recently uploaded "v2" version for the medium speech model. It seems not backward compatible, how does it differ from the first version ?

v1 and v2 differ only in the number of training steps, with v2 being trained for a longer duration.