jishengpeng / WavTokenizer

SOTA discrete acoustic codec models with 40 tokens per second for audio language modeling
MIT License
801 stars 44 forks source link

Comparison with Whisper #27

Open isruihu opened 2 months ago

isruihu commented 2 months ago

hi there, great work! I just wondering that can wavtokenizer be compared with whisper? since the Qwen-audio series uses whisper as an audio encoder, can wavtokenizer be used as an alternative, and where are its advantages and disadvantages?

thanks

jishengpeng commented 2 months ago

hi there, great work! I just wondering that can wavtokenizer be compared with whisper? since the Qwen-audio series uses whisper as an audio encoder, can wavtokenizer be used as an alternative, and where are its advantages and disadvantages?

thanks

The WavTokenizer can be applied to the Qwen-Audio series, as well as the recently introduced Mini-Omni and LLaMA-Omni series. For a comparison with Whisper, please refer to our previous response.

It is worth noting that, in contrast to Whisper, we believe that codec-based approaches hold greater potential for the future. The current challenge appears to lie in the WavTokenizer's encoder, which is not yet powerful enough—a limitation that we are actively working to address.