Maximum duration supported during inference?

jishengpeng / WavTokenizer

SOTA discrete acoustic codec models with 40 tokens per second for audio language modeling

MIT License

658 stars 37 forks source link

Open LiuShixing opened 4 days ago

LiuShixing commented 4 days ago

The training duration is 3 seconds. Without a significant drop in performance, what is the maximum duration supported during inference?

jishengpeng commented 3 days ago

The training duration is 3 seconds. Without a significant drop in performance, what is the maximum duration supported during inference?

It's long, you can try 30 seconds of audio for inference.