Closed med1844 closed 1 year ago
What version of Lightning are you using?
What version of Lightning are you using?
On WSL2:
> pip list | rg lightning
lightning-utilities 0.5.0
pytorch-lightning 1.8.6
On Docker Container:
root@7f696c563553:/app# pip list | grep lightning
pytorch-lightning 0.7.1
Please strictly follow the version defined in requirements.txt. By the way, which checkpoint are you fine tuning? Have you tried PyTorch 1.13, or training from scratch?
Please strictly follow the version defined in requirements.txt.
In refactor
branch, it seems that the version defined is exactly 0.7.1 (source). That's why I wonder if I'm using the wrong branch. Should I try out refactor-v2
?
By the way, which checkpoint are you fine tuning?
Here's a list of checkpoints I have tried to finetune:
Have you tried PyTorch 1.13?
No, as it's not mentioned in README, nor tutorials (in refactor
branch). I will try it now.
Have you tried training from scratch?
No. I don't have enough amount of data (~12min). I understand that I could use OpenCPoP together to train from scratch, but I would prefer to try out low-cost solutions first before actually go rent A100/H100 servers.
Are you using the same hyper-parameters when finetuning from a checkpoint? If not, the state dict and optimizer states may not match and cause the above error.
However, finetuning on a small dataset from a checkpoint may not be that useful as you think. Training multi-speaker models together with other large datasets is still the recommended solution.
Are you using the same hyper-parameters when finetuning from a checkpoint? If not, the state dict and optimizer states may not match and cause the above error.
That's very helpful. Here's what difftool shows me:
@@ -174,7 +184,7 @@ use_midi: false
use_nsf: true
use_pitch_embed: true
use_pos_embed: true
-use_speed_embed: false
+use_speed_embed: true
use_spk_embed: false
use_spk_id: false
use_split_spk_id: false
After changing use_speed_embed
to true
, the model could be loaded correctly:
07/08 05:06:24 AM model and trainer restored from checkpoint: checkpoints/0703_name_ds1000/model_ckpt_steps_360000.ckpt
Validation sanity check: 0%| | 0/1 [00:00<?, ?batch/s]
Unfortunately, there's no "speed"
information in my dataset, thus it crashed. But as long as it loads, it means the optimizer issue has been solved. Thank you again for pointing out this.
However, finetuning on a small dataset from a checkpoint may not be that useful as you think. Training multi-speaker models together with other large datasets is still the recommended solution.
I wonder if there's some kind of key difference between DiffSinger and SVC systems, as most SVC systems seems to work well when it comes to finetuning.
Nevertheless, that's a helpful advice, I appreciate it a lot.
use_speed_embed
is an option related to time stretching augmentation, and if you didn't turn that on, do not change it to True.
Fine-tuning is always a work around in case the original training data cannot be accessed. SVC systems use pre-trained models trained on large corpus because they do not need labeling, they do not have dictionaries, and for ease of use.
Another point is that DiffSinger is far more flexible in model architecture than most SVC systems (their flexibility is mostly related to tricks at inference time).
In addition, fine-tuning may cause leakage in timbre or styles from the original checkpoint. SVS users are far more sensible in these aspects than SVC users, because SVS systems does not have timbre leakage at all without fine-tuning. Fine-tuning may be very suitable on very large, generic models (like LLMs), but as for speech models, they are relatively small and specific.
In addition, fine-tuning may cause leakage in timbre or styles from the original checkpoint.
That's the main reason why I want to try out SVS systems. SVC systems cannot capture the speaker styles, which is crucial when similarity matters. Thanks for pointing this out.
I will try to train from scratch then. Thank you again for the detailed insights!
Attempt on WSL2
I'm using python 3.10.6, CUDA 11.8, torch 2.0.1, Ubuntu 22.04.2 LTS on Windows 11 x86_64. I'm using code in the
refactor
branch and trying to use pretrained models listed in the release section (I tried most of them and none of them works). When I try to finetune on a custom dataset, it reports the following error:NOTE: I have added some
print
statements inpl_utils.py
for more information. Thus, the line information might differ from the original code.Here's what I have done so far:
pipelines/no_midi_preparation.ipynb
to generate config.$ CUDA_VISIBLE_DEVICES=0 python run.py --config data/name/config.yaml --exp_name 0703_name_ds1000
and immediately kill it once thetqdm
bar of the training process shows up. This helps me create an experiment folder with all information required in thecheckpoints/
folder..ckpt
file, such asmodel_ckpt_steps_360000.ckpt
, into that newly created experiment folder.The debugger shows the following information:
It seems that it's caused by difference in
"params"
. Here're information I got from pdb:It seems that the saved params are very different from what just got initialized in the new
AdamW
instance.Attempt on Docker container
I would like to fully eliminate possible issues caused by incorrect torch version, cuda version or even python version. I don't want to hurt the environment on my host machine as I have other projects in development. Thus, I created a dockerfile to test this:
Build and run it:
My questions are:
Thanks.