openvpi / DiffSinger

An advanced singing voice synthesis system with high fidelity, expressiveness, controllability and flexibility based on DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism
Apache License 2.0
2.62k stars 275 forks source link

Batch evaluation inference scripts #155

Open yqzhishen opened 7 months ago

yqzhishen commented 7 months ago

This issue (and its corresponding PR) will add evaluate.py, which applies a model on all items from an existing dataset.

Motivation

Sometimes people just want to hear what if songs of one person are sung in the voice of another person, like what most SVC pipelines actually do.

Also, this may be useful finding edge cases or labeling mistakes (so that the most severe mistakes can be fixed quickly), especially with the help of SVC pipelines.

Possible workflow to fully evaluate a dataset might be:

  1. Run evaluate.py with a model trained from accurately labeled datasets (the reference model) on the target dataset and get the reference audio samples.
  2. (Optional but recommended) Train a SVC model on the target dataset and apply it to the evaluation results to convert their timbre to that of the target dataset.
  3. Calculate the mel spectrogram likelihood between the reference samples and the original recordings in the target dataset.
  4. Sort the likelihood values in ascending order and look into the items with the lowest likelihood.

TODO

blueyred commented 1 month ago

This would be amazing - being able to get an evaluation on a dataset would save so much time