Open LiuShixing opened 4 days ago
The training duration is 3 seconds. Without a significant drop in performance, what is the maximum duration supported during inference?
It's long, you can try 30 seconds of audio for inference.
The training duration is 3 seconds. Without a significant drop in performance, what is the maximum duration supported during inference?