I just installed audioldm on a linux machine with pip:
$ pip3 install audioldm
so far so good.
When I try to run both the text-to-audio as well the audio-to-audio feature I get the following error. Any idea what the problem is?
$ audioldm -t "A hammer is hitting a wooden surface"
Load AudioLDM: %s audioldm-m-full
DiffusionWrapper has 415.95 M params.
/home/marcello/.local/lib/python3.10/site-packages/torch/nn/utils/weight_norm.py:30: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
/home/marcello/.local/lib/python3.10/site-packages/torchlibrosa/stft.py:193: FutureWarning: Pass size=1024 as keyword args. From version 0.10 passing these as positional arguments will result in an error
fft_window = librosa.util.pad_center(fft_window, n_fft)
/home/marcello/.local/lib/python3.10/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3526.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-base and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Traceback (most recent call last):
File "/home/marcello/.local/bin/audioldm", line 152, in
audioldm = build_model(model_name=args.model_name)
File "/home/marcello/.local/lib/python3.10/site-packages/audioldm/pipeline.py", line 86, in build_model
latent_diffusion.load_state_dict(checkpoint["state_dict"])
File "/home/marcello/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2152, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for LatentDiffusion:
Unexpected key(s) in state_dict: "cond_stage_model.model.text_branch.embeddings.position_ids".
$ audioldm --file_path '/home/marcello/Music/Samples/Drum/ch_beat_100_boomclack.wav'
Load AudioLDM: %s audioldm-m-full
DiffusionWrapper has 415.95 M params.
/home/marcello/.local/lib/python3.10/site-packages/torch/nn/utils/weight_norm.py:30: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
/home/marcello/.local/lib/python3.10/site-packages/torchlibrosa/stft.py:193: FutureWarning: Pass size=1024 as keyword args. From version 0.10 passing these as positional arguments will result in an error
fft_window = librosa.util.pad_center(fft_window, n_fft)
/home/marcello/.local/lib/python3.10/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3526.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-base and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Traceback (most recent call last):
File "/home/marcello/.local/bin/audioldm", line 152, in
audioldm = build_model(model_name=args.model_name)
File "/home/marcello/.local/lib/python3.10/site-packages/audioldm/pipeline.py", line 86, in build_model
latent_diffusion.load_state_dict(checkpoint["state_dict"])
File "/home/marcello/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2152, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for LatentDiffusion:
Unexpected key(s) in state_dict: "cond_stage_model.model.text_branch.embeddings.position_ids".
hi,
I just installed audioldm on a linux machine with pip:
$ pip3 install audioldm
so far so good.When I try to run both the text-to-audio as well the audio-to-audio feature I get the following error. Any idea what the problem is?
$ audioldm -t "A hammer is hitting a wooden surface" Load AudioLDM: %s audioldm-m-full DiffusionWrapper has 415.95 M params. /home/marcello/.local/lib/python3.10/site-packages/torch/nn/utils/weight_norm.py:30: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm. warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.") /home/marcello/.local/lib/python3.10/site-packages/torchlibrosa/stft.py:193: FutureWarning: Pass size=1024 as keyword args. From version 0.10 passing these as positional arguments will result in an error fft_window = librosa.util.pad_center(fft_window, n_fft) /home/marcello/.local/lib/python3.10/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3526.) return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined] Some weights of RobertaModel were not initialized from the model checkpoint at roberta-base and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. Traceback (most recent call last): File "/home/marcello/.local/bin/audioldm", line 152, in
audioldm = build_model(model_name=args.model_name)
File "/home/marcello/.local/lib/python3.10/site-packages/audioldm/pipeline.py", line 86, in build_model
latent_diffusion.load_state_dict(checkpoint["state_dict"])
File "/home/marcello/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2152, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for LatentDiffusion:
Unexpected key(s) in state_dict: "cond_stage_model.model.text_branch.embeddings.position_ids".
$ audioldm --file_path '/home/marcello/Music/Samples/Drum/ch_beat_100_boomclack.wav' Load AudioLDM: %s audioldm-m-full DiffusionWrapper has 415.95 M params. /home/marcello/.local/lib/python3.10/site-packages/torch/nn/utils/weight_norm.py:30: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm. warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.") /home/marcello/.local/lib/python3.10/site-packages/torchlibrosa/stft.py:193: FutureWarning: Pass size=1024 as keyword args. From version 0.10 passing these as positional arguments will result in an error fft_window = librosa.util.pad_center(fft_window, n_fft) /home/marcello/.local/lib/python3.10/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3526.) return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined] Some weights of RobertaModel were not initialized from the model checkpoint at roberta-base and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. Traceback (most recent call last): File "/home/marcello/.local/bin/audioldm", line 152, in
audioldm = build_model(model_name=args.model_name)
File "/home/marcello/.local/lib/python3.10/site-packages/audioldm/pipeline.py", line 86, in build_model
latent_diffusion.load_state_dict(checkpoint["state_dict"])
File "/home/marcello/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2152, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for LatentDiffusion:
Unexpected key(s) in state_dict: "cond_stage_model.model.text_branch.embeddings.position_ids".