haoheliu / AudioLDM

AudioLDM: Generate speech, sound effects, music and beyond, with text.
https://audioldm.github.io/
Other
2.45k stars 222 forks source link

Error: RuntimeError: Error(s) in loading state_dict for LatentDiffusion: #110

Closed buscon closed 1 year ago

buscon commented 1 year ago

hi,

I just installed audioldm on a linux machine with pip: $ pip3 install audioldm so far so good.

When I try to run both the text-to-audio as well the audio-to-audio feature I get the following error. Any idea what the problem is?

$ audioldm -t "A hammer is hitting a wooden surface" Load AudioLDM: %s audioldm-m-full DiffusionWrapper has 415.95 M params. /home/marcello/.local/lib/python3.10/site-packages/torch/nn/utils/weight_norm.py:30: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm. warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.") /home/marcello/.local/lib/python3.10/site-packages/torchlibrosa/stft.py:193: FutureWarning: Pass size=1024 as keyword args. From version 0.10 passing these as positional arguments will result in an error fft_window = librosa.util.pad_center(fft_window, n_fft) /home/marcello/.local/lib/python3.10/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3526.) return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined] Some weights of RobertaModel were not initialized from the model checkpoint at roberta-base and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. Traceback (most recent call last): File "/home/marcello/.local/bin/audioldm", line 152, in audioldm = build_model(model_name=args.model_name) File "/home/marcello/.local/lib/python3.10/site-packages/audioldm/pipeline.py", line 86, in build_model latent_diffusion.load_state_dict(checkpoint["state_dict"]) File "/home/marcello/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2152, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for LatentDiffusion: Unexpected key(s) in state_dict: "cond_stage_model.model.text_branch.embeddings.position_ids".

$ audioldm --file_path '/home/marcello/Music/Samples/Drum/ch_beat_100_boomclack.wav' Load AudioLDM: %s audioldm-m-full DiffusionWrapper has 415.95 M params. /home/marcello/.local/lib/python3.10/site-packages/torch/nn/utils/weight_norm.py:30: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm. warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.") /home/marcello/.local/lib/python3.10/site-packages/torchlibrosa/stft.py:193: FutureWarning: Pass size=1024 as keyword args. From version 0.10 passing these as positional arguments will result in an error fft_window = librosa.util.pad_center(fft_window, n_fft) /home/marcello/.local/lib/python3.10/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3526.) return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined] Some weights of RobertaModel were not initialized from the model checkpoint at roberta-base and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. Traceback (most recent call last): File "/home/marcello/.local/bin/audioldm", line 152, in audioldm = build_model(model_name=args.model_name) File "/home/marcello/.local/lib/python3.10/site-packages/audioldm/pipeline.py", line 86, in build_model latent_diffusion.load_state_dict(checkpoint["state_dict"]) File "/home/marcello/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2152, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for LatentDiffusion: Unexpected key(s) in state_dict: "cond_stage_model.model.text_branch.embeddings.position_ids".

buscon commented 1 year ago

Duplicate of https://github.com/haoheliu/AudioLDM/issues/95

Solution: pip install --upgrade transformers==4.29.0