Open ssolito opened 1 year ago
Hi,
The pretrained models have only seen English data and haven't been verified to work well out-of-the-box on other languages (and even for English, it's still experimental and not a well-established metric) -- we've tried it on some data from other languages and the correlations were ok, but the errors were high. So, you can certainly try it and see, but I would recommend using it and interpreting the results with caution.
Hi,
I am linking back to the same question. I would like to know if it is possible to use your Code to perform inference on a different dataset than the one you used for the VoiceMOS Challenge. I tried to run the code but it obviously refers to your val_mos_list.txt .
In case it is possible, what would be the steps to follow? I for the moment have referenced: python run_inference_for_challenge.py --datadir /mydata/
and the error I have is this:
RuntimeError: Error loading audio file: failed to open file /home/aholab/sarah/IMS-Toucan/audios/Mono/Spanish_Aintzane/sys64e2f-uttad5f41e.wav
And sys64e2f-uttad5f41e is one of the audio of the val_mos_list.txt, thus it does not exist in my dataset.
Hi,
We don't have straightforward inference scripts set up just yet, but we are in the process of adding some. In the mean time, please try the following:
First, you have to download pretrained models which it sounds like you probably did already. (In run_inference_for_challenge.py
see steps 1 and 2.)
Then you can look at predict.py
for running inference -- the data directory that you point it to is expected to have a subdirectory called wav
, as well as a file called sets/val_mos_list.txt
that is just a list of wav files and their MOS ratings, e.g.:
sys64e2f-utt8c3d2b2.wav,4.0 sys64e2f-utt3a1aedf.wav,3.625 sys64e2f-utt549b7c4.wav,4.125 sys64e2f-utt0c4d719.wav,3.75 sys64e2f-utt4eddf90.wav,3.625
Replace this with a list of your own wav files, and you can just put dummy MOS numbers there, it's just for computing MSE and correlations, etc. to evaluate the trained MOS prediction model, it doesn't affect the predictions themselves.
There is also expected to be a file called mydata_system.csv
which has system-level averaged MOS values. You can just comment this out (the section of code that starts with ### SYSTEM
, up until the part that says ## generate answer.txt for codalab
)
MOS predictions for each wav file will be written to an output file called answer.txt
.
By the way, the model was trained for MOS prediction on audio which was downsampled to 16kHz and normalized using sv56, so it's best if your input matches this and is also at a 16kHz sampling rate and has been sv56 normalized.
Hello,
I am considering using your pre-trained model to perform an objective evaluation of some Spanish and Basque language models. Could you tell me if it is possible to use checkpoint on these languages as well or if the model is language-dependent? Thank you