Closed jiusansan222 closed 10 months ago
Hi, thank you for your question. The pretrained model checkpoint for the strong leaner is using wav2vec2.0 base model.
But you can change the SSL model as in https://github.com/sarulab-speech/UTMOS22/issues/13.
Thank you very much, I have one more question. Is this published model the same as the model mentioned in table 2 of the paper?
If you mention the model published in the huggingface, it is UTMOS strong without phoneme encoder. It is the same as the model "w/o phoneme encoder" in Table2.
The pretrained models published here are the same as "UTMOS strong" in Table2.
Is the UTMOS strong learner model published in this project based on wav2vec 2.0 ? Thank you!