Is the UTMOS strong learner model published in this project based on wav2vec 2.0 ?

sarulab-speech / UTMOS22

UT-Sarulab MOS prediction system using SSL models

MIT License

148 stars 14 forks source link

Is the UTMOS strong learner model published in this project based on wav2vec 2.0 ? #14

Closed jiusansan222 closed 10 months ago

jiusansan222 commented 10 months ago

Is the UTMOS strong learner model published in this project based on wav2vec 2.0 ? Thank you!

Takaaki-Saeki commented 10 months ago

Hi, thank you for your question. The pretrained model checkpoint for the strong leaner is using wav2vec2.0 base model.
But you can change the SSL model as in https://github.com/sarulab-speech/UTMOS22/issues/13.

jiusansan222 commented 10 months ago

Thank you very much, I have one more question. Is this published model the same as the model mentioned in table 2 of the paper?

Takaaki-Saeki commented 10 months ago

If you mention the model published in the huggingface, it is UTMOS strong without phoneme encoder. It is the same as the model "w/o phoneme encoder" in Table2.

The pretrained models published here are the same as "UTMOS strong" in Table2.