openvpi / DiffSinger

An advanced singing voice synthesis system with high fidelity, expressiveness, controllability and flexibility based on DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism
Apache License 2.0
2.62k stars 275 forks source link

Training variance models from DS files #131

Closed yqzhishen closed 11 months ago

yqzhishen commented 11 months ago

In this proposed feature, we will support training variance models from DS files, and even without recordings.

The core of this feature is to let the binarizer load required attribute from DS files instead of reading from transcriptions.csv or extracting from recordings. Given a piece of data named _myitem in transcriptions.csv, if DS files binarization is enabled, the binarizer will follow the order of loading attributes below:

  1. Load from the first segment in DS file at /ds/my_item.ds (full name matching).
  2. Load from the k th segment in DS file at /ds/my_item.ds if the item name is in _myitem#k pattern.
  3. Load from transcriptions.csv or extract features from the waveform (fallback logic).
  4. Raise an error if none of the resources above are available.

Please note that you still need a transcriptions.csv to declare all data pieces included in binarization. But if you have all required attributes in the DS files, the CSV file can only contain one single column (the name column).

DS files can be exported from OpenUtau for DiffSinger. For support to convert segmented DS files to transcriptions.csv, see https://github.com/openvpi/MakeDiffSinger/pull/9.