Closed yqzhishen closed 1 year ago
Currently deprecated or backward-incompatible stuff:
g2p_dictionary
-> dictionary
num_pad_tokens
in config fileBug fixes compared to old branch:
hparams keys renamed:
After modularizing the optimizer and LR scheduler, hparam key changes are the following:
optimizer_args.optimizer_cls
: Optimizer class namelr_scheduler_args.scheduler_cls
: Scheduler class namelr
-> optimizer_args.lr
optimizer_adam_beta1
-> optimizer_args.beta1
optimizer_adam_beta2
-> optimizer_args.beta2
weight_decay
-> optimizer_args.weight_decay
warmup_updates
-> lr_scheduler_args.warmup_steps
lr_decay_steps
-> lr_scheduler_args.step_size
lr_decay_gamma
-> lr_scheduler_args.gamma
Old | New |
---|---|
```yaml lr: 0.0004 lr_decay_steps: 50000 lr_decay_gamma: 0.5 ``` | ```yaml optimizer_args: lr: 0.0004 lr_scheduler_args: step_size: 50000 gamma: 0.5 ``` |
Note that optimizer_args
and lr_scheduler_args
will be filtered by needed parameters and passed to __init__
as kwargs
when constructing the optimizer and scheduler. Therefore, you could specify all you need in the configuration file to directly control the behavior of the optimization and LR scheduling. It will also tolerate parameters existing in config but not needed in __init__
.
Also, note that the LR scheduler performs scheduling on the granularity of steps, not epochs.
The special case applies when a tuple is needed in __init__
; beta1
and beta2
are treated separately and form a tuple in the code. You could try to pass in an array instead. (And as an experiment, AdamW does accept [beta1, beta2]
). If there is another special treatment required, please submit an issue.
Augmentations now can be (and must be) enabled/disabled via enabled
config keys since https://github.com/openvpi/DiffSinger/commit/34fe5399ffcc1c2341dbbc210708b76d5eeaae5b and https://github.com/openvpi/DiffSinger/commit/4b0d95ad650791dd6dc11435bd8fd511a7bd4406.
Examples:
augmentation_args:
random_pitch_shifting:
enabled: false # <-- control this option
range: [-5., 5.]
scale: 1.0
In https://github.com/openvpi/DiffSinger/commit/99f10793aaa7af18228e201194027c69496845bc, number of padding tokens is set to 1 by default. Datasets should be re-binarized after this commit; otherwise, the following line must be added to the config file:
num_pad_tokens: 3
Compatibility for old DS format has been removed in https://github.com/openvpi/DiffSinger/commit/af4d8ec8e64e686de69f5437f0d475896ef6ba18. They should be re-exported in the new format for running inference.
Support for Python 3.8 has been restored in https://github.com/openvpi/DiffSinger/commit/51acdde6758fb6b9a57815e7c0154442b59fe52b and https://github.com/openvpi/DiffSinger/commit/3b48c0ba75071de6b5c3ca7b1291c406ed68fdc1.
Since https://github.com/openvpi/DiffSinger/commit/94c0b9f240b57b626ae1c73c5960fa25dad64b8c, binarize.py and train.py does not require manually setting the PYTHONPATH
environment variable anymore.
https://github.com/openvpi/DiffSinger/commit/224fd33f39b7796d0b1c47db5cfd46e68e03fce9 introduces a new config key: spk_ids
.
Users can now customize the arrangements of speaker IDs, which are generated as $0,1,2,...,N-1$ by default. Speaker IDs can be duplicate or discontinuous, thus users can now merge or re-organize their multi-speaker datasets by modifying spk_ids
and without changing anything else.
For example, if one dataset contains 3 styles, it requires only one single line in the config file to merge the first two styles into one, by giving them the same spk_id, instead of merging their recordings and transcriptions:
spk_ids: [0, 0, 1]
Since https://github.com/openvpi/DiffSinger/commit/b2f9aaffef97dd1f9683aa115bd4bd057b5ab4a6, variances_prediction_args.repeat_bins
has been replaced by variances_prediction_args.total_repeat_bins
. This is because the number of predicted variance parameters may vary, and using total_repeat_bins
can describe the neural networks more directly. For backward compatibility, user must calculate this new config value manually before loading former checkpoints like follows:
Old | New |
---|---|
```yaml predict_energy: true predict_breathiness: true variances_prediction_args: repeat_bins: 24 # 24 bins for each parameter ``` | ```yaml predict_energy: true predict_breathiness: true variances_prediction_args: total_repeat_bins: 48 # 2 * 24 = 48 bins in total ``` |
Retaking mechanism of variance parameters have been refactored in https://github.com/openvpi/DiffSinger/commit/782f004a571445eacfbaea5f7542d6deae853618, thus all variance models involving pitch and variance predictions should be re-trained.
Task list