Audio-to-audio - Githubissues

Hi, trying to run audio-to-audio, but getting:
audioldm -t "A hammer is hitting a wooden surface"

Load AudioLDM: %s audioldm-m-full
DiffusionWrapper has 415.95 M params.
/opt/conda/envs/audioldm/lib/python3.8/site-packages/torch/nn/utils/weight_norm.py:134: FutureWarning: `torch.nn.utils.weight_norm` is deprecated in favor of `torch.nn.utils.parametrizations.weight_norm`.
  WeightNorm.apply(module, name, dim)
/opt/conda/envs/audioldm/lib/python3.8/site-packages/transformers/tokenization_utils_base.py:1601: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884
  warnings.warn(
/opt/conda/envs/audioldm/lib/python3.8/site-packages/torchlibrosa/stft.py:193: FutureWarning: Pass size=1024 as keyword args. From version 0.10 passing these as positional arguments will result in an error
  fft_window = librosa.util.pad_center(fft_window, n_fft)
/opt/conda/envs/audioldm/lib/python3.8/site-packages/torch/functional.py:513: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3609.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
Some weights of RobertaModel were not initialized from the model checkpoint at roberta-base and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
/opt/conda/envs/audioldm/lib/python3.8/site-packages/audioldm/pipeline.py:85: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  checkpoint = torch.load(resume_from_checkpoint, map_location=device)
Traceback (most recent call last):
  File "/opt/conda/envs/audioldm/bin/audioldm", line 152, in <module>
    audioldm = build_model(model_name=args.model_name)
  File "/opt/conda/envs/audioldm/lib/python3.8/site-packages/audioldm/pipeline.py", line 86, in build_model
    latent_diffusion.load_state_dict(checkpoint["state_dict"])
  File "/opt/conda/envs/audioldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 2215, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for LatentDiffusion:
        Unexpected key(s) in state_dict: "cond_stage_model.model.text_branch.embeddings.position_ids". ```

  I would be happy for some help.
  tnx!
haoheliu / AudioLDM

Audio-to-audio #127