Tracing: deep refactoring of codebase and migration to new PyTorch & Lightning frameworks

yqzhishen commented 1 year ago

Task list

[x] Clean up useless codes
[x] Re-organize the whole repository
[x] Migrate to PyTorch 2.0 and Lightning 2.0: #72
[x] Performance tuning and troubleshooting
[x] Re-implement ONNX exporting scripts: #75
[x] Update docs and README: #77, #86, #106
[x] Move dataset making pipelines to new place

yqzhishen commented 1 year ago

Currently deprecated or backward-incompatible stuff:

MIDI-A/B, PE, 24kHz vocoder
g2p_dictionary -> dictionary
dependencies need to be re-installed
checkpoints need to be migrated
to resume training from old checkpoints, must specify num_pad_tokens in config file
scripts of infer, binarize, train, etc. are moved into scripts/
datasets from old branch need to be re-binarized before training

yqzhishen commented 1 year ago

Bug fixes compared to old branch:

fixed a bug causing the epoch count to be only 1/1000 of the actual epoch
fixed a bug which may cause potential file handle racing when reading/seeking dataset
fixed a bug causing inconsistency between joint augmentation formula and implementation
fixed hparams failed to render colors in some terminal and some Python versions
fixed messed up code backup directory structure

yqzhishen commented 1 year ago

hparams keys renamed:

max_tokens -> max_batch_frames
max_sentences -> max_batch_size
max_eval_tokens -> max_val_batch_frames
max_eval_sentences -> max_val_batch_size
decay_steps -> lr_decay_steps
gamma -> lr_decay_gamma

hrukalive commented 1 year ago

After modularizing the optimizer and LR scheduler, hparam key changes are the following:

New keys:
- optimizer_args.optimizer_cls: Optimizer class name
- lr_scheduler_args.scheduler_cls: Scheduler class name
Renamed/moved keys:
- lr -> optimizer_args.lr
- optimizer_adam_beta1 -> optimizer_args.beta1
- optimizer_adam_beta2 -> optimizer_args.beta2
- weight_decay -> optimizer_args.weight_decay
- warmup_updates -> lr_scheduler_args.warmup_steps
- lr_decay_steps -> lr_scheduler_args.step_size
- lr_decay_gamma -> lr_scheduler_args.gamma

Migrating from the old configuration files usually looks like

Old	New
```yaml lr: 0.0004 lr_decay_steps: 50000 lr_decay_gamma: 0.5 ```	```yaml optimizer_args: lr: 0.0004 lr_scheduler_args: step_size: 50000 gamma: 0.5 ```

For those who want to test other optimizer and scheduler

Note that optimizer_args and lr_scheduler_args will be filtered by needed parameters and passed to __init__ as kwargs when constructing the optimizer and scheduler. Therefore, you could specify all you need in the configuration file to directly control the behavior of the optimization and LR scheduling. It will also tolerate parameters existing in config but not needed in __init__.

Also, note that the LR scheduler performs scheduling on the granularity of steps, not epochs.

The special case applies when a tuple is needed in __init__; beta1 and beta2 are treated separately and form a tuple in the code. You could try to pass in an array instead. (And as an experiment, AdamW does accept [beta1, beta2]). If there is another special treatment required, please submit an issue.

yqzhishen commented 1 year ago

Augmentations now can be (and must be) enabled/disabled via enabled config keys since https://github.com/openvpi/DiffSinger/commit/34fe5399ffcc1c2341dbbc210708b76d5eeaae5b and https://github.com/openvpi/DiffSinger/commit/4b0d95ad650791dd6dc11435bd8fd511a7bd4406.

Examples:

augmentation_args:
  random_pitch_shifting:
    enabled: false  # <-- control this option
    range: [-5., 5.]
    scale: 1.0

yqzhishen commented 1 year ago

In https://github.com/openvpi/DiffSinger/commit/99f10793aaa7af18228e201194027c69496845bc, number of padding tokens is set to 1 by default. Datasets should be re-binarized after this commit; otherwise, the following line must be added to the config file:

num_pad_tokens: 3

yqzhishen commented 1 year ago

Compatibility for old DS format has been removed in https://github.com/openvpi/DiffSinger/commit/af4d8ec8e64e686de69f5437f0d475896ef6ba18. They should be re-exported in the new format for running inference.

Support for Python 3.8 has been restored in https://github.com/openvpi/DiffSinger/commit/51acdde6758fb6b9a57815e7c0154442b59fe52b and https://github.com/openvpi/DiffSinger/commit/3b48c0ba75071de6b5c3ca7b1291c406ed68fdc1.

Since https://github.com/openvpi/DiffSinger/commit/94c0b9f240b57b626ae1c73c5960fa25dad64b8c, binarize.py and train.py does not require manually setting the PYTHONPATH environment variable anymore.

yqzhishen commented 1 year ago

https://github.com/openvpi/DiffSinger/commit/224fd33f39b7796d0b1c47db5cfd46e68e03fce9 introduces a new config key: spk_ids.

Users can now customize the arrangements of speaker IDs, which are generated as $0,1,2,...,N-1$ by default. Speaker IDs can be duplicate or discontinuous, thus users can now merge or re-organize their multi-speaker datasets by modifying spk_ids and without changing anything else.

For example, if one dataset contains 3 styles, it requires only one single line in the config file to merge the first two styles into one, by giving them the same spk_id, instead of merging their recordings and transcriptions:

spk_ids: [0, 0, 1]

yqzhishen commented 1 year ago

Since https://github.com/openvpi/DiffSinger/commit/b2f9aaffef97dd1f9683aa115bd4bd057b5ab4a6, variances_prediction_args.repeat_bins has been replaced by variances_prediction_args.total_repeat_bins. This is because the number of predicted variance parameters may vary, and using total_repeat_bins can describe the neural networks more directly. For backward compatibility, user must calculate this new config value manually before loading former checkpoints like follows:

Old	New
```yaml predict_energy: true predict_breathiness: true variances_prediction_args: repeat_bins: 24 # 24 bins for each parameter ```	```yaml predict_energy: true predict_breathiness: true variances_prediction_args: total_repeat_bins: 48 # 2 * 24 = 48 bins in total ```

yqzhishen commented 1 year ago

Retaking mechanism of variance parameters have been refactored in https://github.com/openvpi/DiffSinger/commit/782f004a571445eacfbaea5f7542d6deae853618, thus all variance models involving pitch and variance predictions should be re-trained.

openvpi / DiffSinger