KeyError: 'visual.layer1.0.conv1.weight'

yangdongchao / Text-to-sound-Synthesis

The source code of our paper "Diffsound: discrete diffusion model for text-to-sound generation"

345 stars 36 forks source link

(specvqgan) dto@thexder:/apdcephfs/share_1316500/donchaoyang/code3/Text-to-sound-Synthesis/Diffsound$ python3 evaluation/generate_samples_batch.py /home/dto/miniconda3/envs/specvqgan/lib/python3.8/site-packages/torch/cuda/init.py:83: UserWarning: CUDA initialization: The NVIDIA driver on your system is too old (found version 10010). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:109.) return torch._C._cuda_getDeviceCount() > 0 /home/dto/miniconda3/envs/specvqgan/lib/python3.8/site-packages/torch/amp/autocast_mode.py:198: UserWarning: User provided device_type of 'cuda', but CUDA is not available. Disabling warnings.warn('User provided device_type of \'cuda\', but CUDA is not available. Disabling') Restored from /apdcephfs/share_1316500/donchaoyang/code3/SpecVQGAN/logs/2022-04-24T23-17-27_audioset_codebook256/checkpoints/last.ckpt Traceback (most recent call last): File "evaluation/generate_samples_batch.py", line 204, in Diffsound = Diffsound(config=config_path, path=pretrained_model_path, ckpt_vocoder=ckpt_vocoder) File "evaluation/generate_samples_batch.py", line 44, in init self.info = self.get_model(ema=True, model_path=path, config_path=config) File "evaluation/generate_samples_batch.py", line 64, in get_model model = build_model(config) #加载 dalle model File "evaluation/../sound_synthesis/modeling/build.py", line 5, in build_model return instantiate_from_config(config['model']) File "evaluation/../sound_synthesis/utils/misc.py", line 132, in instantiate_from_config return cls(config.get("params", dict())) File "evaluation/../sound_synthesis/modeling/models/dalle_spec.py", line 40, in init self.transformer = instantiate_from_config(diffusion_config) File "evaluation/../sound_synthesis/utils/misc.py", line 132, in instantiate_from_config return cls(config.get("params", dict())) File "evaluation/../sound_synthesis/modeling/transformers/diffusion_transformer.py", line 172, in init self.condition_emb = instantiate_from_config(condition_emb_config) # 加载能获得condition embedding的模型 File "evaluation/../sound_synthesis/utils/misc.py", line 132, in instantiate_from_config return cls(**config.get("params", dict())) File "evaluation/../sound_synthesis/modeling/embeddings/clip_textembedding.py", line 25, in init model, = clip.load(clip_name, device='cpu',jit=False) File "evaluation/../sound_synthesis/modeling/modules/clip/clip.py", line 114, in load model = build_model(state_dict or model.state_dict()).to(device) File "evaluation/../sound_synthesis/modeling/modules/clip/model.py", line 409, in build_model vision_width = state_dict["visual.layer1.0.conv1.weight"].shape[0] KeyError: 'visual.layer1.0.conv1.weight'

I am still getting this KeyError even with a freshly created environment (according to your instructions with "conda env create -f".) However, I needed to install a package named "einops" from conda-forge first. But I still get the error after running. Can you help?

(specvqgan) dto@thexder:/apdcephfs/share_1316500/donchaoyang/code3/Text-to-sound-Synthesis/Diffsound$ conda install einops -c conda-forge Collecting package metadata (current_repodata.json): done Solving environment: done

Package Plan

environment location: /home/dto/miniconda3/envs/specvqgan

added / updated specs:

einops

The following NEW packages will be INSTALLED:

einops conda-forge/noarch::einops-0.4.1-pyhd8ed1ab_0

The following packages will be SUPERSEDED by a higher-priority channel:

ca-certificates pkgs/main::ca-certificates-2022.07.19~ --> conda-forge::ca-certificates-2022.6.15-ha878542_0 certifi pkgs/main::certifi-2022.6.15-py38h06a~ --> conda-forge::certifi-2022.6.15-py38h578d9bd_0

Proceed ([y]/n)?

Preparing transaction: done Verifying transaction: done Executing transaction: done (specvqgan) dto@thexder:/apdcephfs/share_1316500/donchaoyang/code3/Text-to-sound-Synthesis/Diffsound$ USE_CUDA=0 python3 evaluation/generate_samples_batch.py Restored from /apdcephfs/share_1316500/donchaoyang/code3/SpecVQGAN/logs/2022-04-24T23-17-27_audioset_codebook256/checkpoints/last.ckpt Traceback (most recent call last): File "evaluation/generate_samples_batch.py", line 204, in Diffsound = Diffsound(config=config_path, path=pretrained_model_path, ckpt_vocoder=ckpt_vocoder) File "evaluation/generate_samples_batch.py", line 44, in init self.info = self.get_model(ema=True, model_path=path, config_path=config) File "evaluation/generate_samples_batch.py", line 64, in get_model model = build_model(config) #加载 dalle model File "evaluation/../sound_synthesis/modeling/build.py", line 5, in build_model return instantiate_from_config(config['model']) File "evaluation/../sound_synthesis/utils/misc.py", line 132, in instantiate_from_config return cls(config.get("params", dict())) File "evaluation/../sound_synthesis/modeling/models/dalle_spec.py", line 40, in init self.transformer = instantiate_from_config(diffusion_config) File "evaluation/../sound_synthesis/utils/misc.py", line 132, in instantiate_from_config return cls(config.get("params", dict())) File "evaluation/../sound_synthesis/modeling/transformers/diffusion_transformer.py", line 172, in init self.condition_emb = instantiate_from_config(condition_emb_config) # 加载能获得condition embedding的模型 File "evaluation/../sound_synthesis/utils/misc.py", line 132, in instantiate_from_config return cls(**config.get("params", dict())) File "evaluation/../sound_synthesis/modeling/embeddings/clip_textembedding.py", line 25, in init model, = clip.load(clip_name, device='cpu',jit=False) File "evaluation/../sound_synthesis/modeling/modules/clip/clip.py", line 114, in load model = build_model(state_dict or model.state_dict()).to(device) File "evaluation/../sound_synthesis/modeling/modules/clip/model.py", line 409, in build_model vision_width = state_dict["visual.layer1.0.conv1.weight"].shape[0] KeyError: 'visual.layer1.0.conv1.weight' (specvqgan) dto@thexder:/apdcephfs/share_1316500/donchaoyang/code3/Text-to-sound-Synthesis/Diffsound$

yangdongchao / Text-to-sound-Synthesis

KeyError: 'visual.layer1.0.conv1.weight' #3

Package Plan