openvpi / DiffSinger

An advanced singing voice synthesis system with high fidelity, expressiveness, controllability and flexibility based on DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism
Apache License 2.0
2.73k stars 288 forks source link

Tracing: deep refactoring of codebase and migration to new PyTorch & Lightning frameworks #74

Closed yqzhishen closed 1 year ago

yqzhishen commented 1 year ago

Task list

yqzhishen commented 1 year ago

Currently deprecated or backward-incompatible stuff:

yqzhishen commented 1 year ago

Bug fixes compared to old branch:

yqzhishen commented 1 year ago

hparams keys renamed:

hrukalive commented 1 year ago

After modularizing the optimizer and LR scheduler, hparam key changes are the following:

Migrating from the old configuration files usually looks like

Old New
```yaml lr: 0.0004 lr_decay_steps: 50000 lr_decay_gamma: 0.5 ``` ```yaml optimizer_args: lr: 0.0004 lr_scheduler_args: step_size: 50000 gamma: 0.5 ```

For those who want to test other optimizer and scheduler

Note that optimizer_args and lr_scheduler_args will be filtered by needed parameters and passed to __init__ as kwargs when constructing the optimizer and scheduler. Therefore, you could specify all you need in the configuration file to directly control the behavior of the optimization and LR scheduling. It will also tolerate parameters existing in config but not needed in __init__.

Also, note that the LR scheduler performs scheduling on the granularity of steps, not epochs.

The special case applies when a tuple is needed in __init__; beta1 and beta2 are treated separately and form a tuple in the code. You could try to pass in an array instead. (And as an experiment, AdamW does accept [beta1, beta2]). If there is another special treatment required, please submit an issue.

yqzhishen commented 1 year ago

Augmentations now can be (and must be) enabled/disabled via enabled config keys since https://github.com/openvpi/DiffSinger/commit/34fe5399ffcc1c2341dbbc210708b76d5eeaae5b and https://github.com/openvpi/DiffSinger/commit/4b0d95ad650791dd6dc11435bd8fd511a7bd4406.

Examples:

augmentation_args:
  random_pitch_shifting:
    enabled: false  # <-- control this option
    range: [-5., 5.]
    scale: 1.0
yqzhishen commented 1 year ago

In https://github.com/openvpi/DiffSinger/commit/99f10793aaa7af18228e201194027c69496845bc, number of padding tokens is set to 1 by default. Datasets should be re-binarized after this commit; otherwise, the following line must be added to the config file:

num_pad_tokens: 3
yqzhishen commented 1 year ago

Compatibility for old DS format has been removed in https://github.com/openvpi/DiffSinger/commit/af4d8ec8e64e686de69f5437f0d475896ef6ba18. They should be re-exported in the new format for running inference.

Support for Python 3.8 has been restored in https://github.com/openvpi/DiffSinger/commit/51acdde6758fb6b9a57815e7c0154442b59fe52b and https://github.com/openvpi/DiffSinger/commit/3b48c0ba75071de6b5c3ca7b1291c406ed68fdc1.

Since https://github.com/openvpi/DiffSinger/commit/94c0b9f240b57b626ae1c73c5960fa25dad64b8c, binarize.py and train.py does not require manually setting the PYTHONPATH environment variable anymore.

yqzhishen commented 1 year ago

https://github.com/openvpi/DiffSinger/commit/224fd33f39b7796d0b1c47db5cfd46e68e03fce9 introduces a new config key: spk_ids.

Users can now customize the arrangements of speaker IDs, which are generated as $0,1,2,...,N-1$ by default. Speaker IDs can be duplicate or discontinuous, thus users can now merge or re-organize their multi-speaker datasets by modifying spk_ids and without changing anything else.

For example, if one dataset contains 3 styles, it requires only one single line in the config file to merge the first two styles into one, by giving them the same spk_id, instead of merging their recordings and transcriptions:

spk_ids: [0, 0, 1]
yqzhishen commented 1 year ago

Since https://github.com/openvpi/DiffSinger/commit/b2f9aaffef97dd1f9683aa115bd4bd057b5ab4a6, variances_prediction_args.repeat_bins has been replaced by variances_prediction_args.total_repeat_bins. This is because the number of predicted variance parameters may vary, and using total_repeat_bins can describe the neural networks more directly. For backward compatibility, user must calculate this new config value manually before loading former checkpoints like follows:

Old New
```yaml predict_energy: true predict_breathiness: true variances_prediction_args: repeat_bins: 24 # 24 bins for each parameter ``` ```yaml predict_energy: true predict_breathiness: true variances_prediction_args: total_repeat_bins: 48 # 2 * 24 = 48 bins in total ```
yqzhishen commented 1 year ago

Retaking mechanism of variance parameters have been refactored in https://github.com/openvpi/DiffSinger/commit/782f004a571445eacfbaea5f7542d6deae853618, thus all variance models involving pitch and variance predictions should be re-trained.