jishengpeng / WavTokenizer

SOTA discrete acoustic codec models with 40 tokens per second for audio language modeling
MIT License
658 stars 37 forks source link

Maximum duration supported during inference? #31

Open LiuShixing opened 4 days ago

LiuShixing commented 4 days ago

The training duration is 3 seconds. Without a significant drop in performance, what is the maximum duration supported during inference?

jishengpeng commented 3 days ago

The training duration is 3 seconds. Without a significant drop in performance, what is the maximum duration supported during inference?

It's long, you can try 30 seconds of audio for inference.