Closed susenyang closed 1 year ago
unfortunately we can't release that part to prevent unauthorized voice cloning. there are some forks with approximations like the one above (not maintained by suno) that you can have a look at. But probably the cloning won't sound particularly similar to the prompt in most cases.
I want to use my audio as the prompt, but I need to extract the corresponding semantic features, coarse features and fine features. In order to align the pre-trained models, can you tell me which version of models(hubert_kmeans, Encodec) you are using to extract features?