Open j-min opened 3 years ago
And it seems like the current auto-downloaded checkpoint is not compatible with the current colab code. I originally thought it was due to the version mismatch above, but maybe there's another issue. Would you please check? Below I attach the error log when running this cell.
2021-08-25 22:36:53.220519: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
Working with z of shape (1, 256, 16, 16) = 65536 dimensions.
Downloading: "https://download.pytorch.org/models/vgg16-397923af.pth" to /root/.cache/torch/hub/checkpoints/vgg16-397923af.pth
100% 528M/528M [00:03<00:00, 177MB/s]
Downloading vgg_lpips model from https://heibox.uni-heidelberg.de/f/607503859c864bc1b30b/?dl=1 to taming/modules/autoencoder/lpips/vgg.pth
8.19kB [00:00, 354kB/s]
loaded pretrained LPIPS loss from taming/modules/autoencoder/lpips/vgg.pth
VQLPIPSWithDiscriminator running with hinge loss.
Loaded VQGAN from /root/.cache/dalle/vqgan.1024.model.ckpt and /root/.cache/dalle/vqgan.1024.config.yml
Traceback (most recent call last):
File "/content/dalle-pytorch-pretrained/dalle-pytorch-pretrained/DALLE-pytorch/generate.py", line 96, in <module>
dalle.load_state_dict(weights)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1407, in load_state_dict
self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for DALLE:
Missing key(s) in state_dict: "transformer.pos_emb", "transformer.layers.blocks.0.f.net.fn.fn.fn.to_qkv.weight", "transformer.layers.blocks.0.f.net.fn.fn.fn.to_out.0.weight", "transformer.layers.blocks.0.f.net.fn.fn.fn.to_out.0.bias", "transformer.layers.blocks.0.f.net.fn.fn.fn.attn_fn.master_layout", "transformer.layers.blocks.0.g.net.fn.fn.fn.net.0.weight", "transformer.layers.blocks.0.g.net.fn.fn.fn.net.0.bias", "transformer.layers.blocks.0.g.net.fn.fn.fn.net.3.weight", "transformer.layers.blocks.0.g.net.fn.fn.fn.net.3.bias", "transformer.layers.blocks.1.f.net.fn.fn.fn.to_qkv.weight", "transformer.layers.blocks.1.f.net.fn.fn.fn.to_out.0.weight", "transformer.layers.blocks.1.f.net.fn.fn.fn.to_out.0.bias", "transformer.layers.blocks.1.f.net.fn.fn.fn.attn_fn.master_layout", "transformer.layers.blocks.1.g.net.fn.fn.fn.net.0.weight", "transformer.layers.blocks.1.g.net.fn.fn.fn.net.0.bias", "transformer.layers.blocks.1.g.net.fn.fn.fn.net.3.weight", "transformer.layers.blocks.1.g.net.fn.fn.fn.net.3.bias", "transformer.layers.blocks.2.f.net.fn.fn.fn.to_qkv.weight", "transformer.layers.blocks.2.f.net.fn.fn.fn.to_out.0.weight", "transformer.layers.blocks.2.f.net.fn.fn.fn.to_out.0.bias", "transformer.layers.blocks.2.f.net.fn.fn.fn.attn_fn.master_layout", "transformer.layers.blocks.2.g.net.fn.fn.fn.net.0.weight", "transformer.layers.blocks.2.g.net.fn.fn.fn.net.0.bias", "transformer.layers.blocks.2.g.net.fn.fn.fn.net.3.weight", "transformer.layers.blocks.2.g.net.fn.fn.fn.net.3.bias", "transformer.layers.blocks.3.f.net.fn.fn.fn.to_qkv.weight", "transformer.layers.blocks.3.f.net.fn.fn.fn.to_out.0.weight", "transformer.layers.blocks.3.f.net.fn.fn.fn.to_out.0.bias", "transformer.layers.blocks.3.f.net.fn.fn.fn.attn_fn.master_layout", "transformer.layers.blocks.3.g.net.fn.fn.fn.net.0.weight", "transformer.layers.blocks.3.g.net.fn.fn.fn.net.0.bias", "transformer.layers.blocks.3.g.net.fn.fn.fn.net.3.weight", "transformer.layers.blocks.3.g.net.fn.fn.fn.net.3.bias", "transformer.layers.blocks.4.f.net.fn.fn.fn.to_qkv.weight", "transformer.layers.blocks.4.f.net.fn.fn.fn.to_out.0.weight", "transformer.layers.blocks.4.f.net.fn.fn.fn.to_out.0.bias", "transformer.layers.blocks.4.f.net.fn.fn.fn.attn_fn.master_layout", "transformer.layers.blocks.4.g.net.fn.fn.fn.net.0.weight", "transformer.layers.blocks.4.g.net.fn.fn.fn.net.0.bias", "transformer.layers.blocks.4.g.net.fn.fn.fn.net.3.weight", "transformer.layers.blocks.4.g.net.fn.fn.fn.net.3.bias", "transformer.layers.blocks.5.f.net.fn.fn.fn.to_qkv.weight", "transformer.layers.blocks.5.f.net.fn.fn.fn.to_out.0.weight", "transformer.layers.blocks.5.f.net.fn.fn.fn.to_out.0.bias", "transformer.layers.blocks.5.f.net.fn.fn.fn.attn_fn.master_layout", "transformer.layers.blocks.5.g.net.fn.fn.fn.net.0.weight", "transformer.layers.blocks.5.g.net.fn.fn.fn.net.0.bias", "transformer.layers.blocks.5.g.net.fn.fn.fn.net.3.weight", "transformer.layers.blocks.5.g.net.fn.fn.fn.net.3.bias", "transformer.layers.blocks.6.f.net.fn.fn.fn.to_qkv.weight", "transformer.layers.blocks.6.f.net.fn.fn.fn.to_out.0.weight", "transformer.layers.blocks.6.f.net.fn.fn.fn.to_out.0.bias", "transformer.layers.blocks.6.f.net.fn.fn.fn.attn_fn.master_layout", "transformer.layers.blocks.6.g.net.fn.fn.fn.net.0.weight", "transformer.layers.blocks.6.g.net.fn.fn.fn.net.0.bias", "transformer.layers.blocks.6.g.net.fn.fn.fn.net.3.weight", "transformer.layers.blocks.6.g.net.fn.fn.fn.net.3.bias", "transformer.layers.blocks.7.f.net.fn.fn.fn.to_qkv.weight", "transformer.layers.blocks.7.f.net.fn.fn.fn.to_out.0.weight", "transformer.layers.blocks.7.f.net.fn.fn.fn.to_out.0.bias", "transformer.layers.blocks.7.f.net.fn.fn.fn.attn_fn.master_layout", "transformer.layers.blocks.7.g.net.fn.fn.fn.net.0.weight", "transformer.layers.blocks.7.g.net.fn.fn.fn.net.0.bias", "transformer.layers.blocks.7.g.net.fn.fn.fn.net.3.weight", "transformer.layers.blocks.7.g.net.fn.fn.fn.net.3.bias", "transformer.layers.blocks.8.f.net.fn.fn.fn.to_qkv.weight", "transformer.layers.blocks.8.f.net.fn.fn.fn.to_out.0.weight", "transformer.layers.blocks.8.f.net.fn.fn.fn.to_out.0.bias", "transformer.layers.blocks.8.f.net.fn.fn.fn.attn_fn.master_layout", "transformer.layers.blocks.8.g.net.fn.fn.fn.net.0.weight", "transformer.layers.blocks.8.g.net.fn.fn.fn.net.0.bias", "transformer.layers.blocks.8.g.net.fn.fn.fn.net.3.weight", "transformer.layers.blocks.8.g.net.fn.fn.fn.net.3.bias", "transformer.layers.blocks.9.f.net.fn.fn.fn.to_qkv.weight", "transformer.layers.blocks.9.f.net.fn.fn.fn.to_out.0.weight", "transformer.layers.blocks.9.f.net.fn.fn.fn.to_out.0.bias", "transformer.layers.blocks.9.f.net.fn.fn.fn.attn_fn.master_layout", "transformer.layers.blocks.9.g.net.fn.fn.fn.net.0.weight", "transformer.layers.blocks.9.g.net.fn.fn.fn.net.0.bias", "transformer.layers.blocks.9.g.net.fn.fn.fn.net.3.weight", "transformer.layers.blocks.9.g.net.fn.fn.fn.net.3.bias", "transformer.layers.blocks.10.f.net.fn.fn.fn.to_qkv.weight", "transformer.layers.blocks.10.f.net.fn.fn.fn.to_out.0.weight", "transformer.layers.blocks.10.f.net.fn.fn.fn.to_out.0.bias", "transformer.layers.blocks.10.f.net.fn.fn.fn.attn_fn.master_layout", "transformer.layers.blocks.10.g.net.fn.fn.fn.net.0.weight", "transformer.layers.blocks.10.g.net.fn.fn.fn.net.0.bias", "transformer.layers.blocks.10.g.net.fn.fn.fn.net.3.weight", "transformer.layers.blocks.10.g.net.fn.fn.fn.net.3.bias", "transformer.layers.blocks.11.f.net.fn.fn.fn.to_qkv.weight", "transformer.layers.blocks.11.f.net.fn.fn.fn.to_out.0.weight", "transformer.layers.blocks.11.f.net.fn.fn.fn.to_out.0.bias", "transformer.layers.blocks.11.f.net.fn.fn.fn.attn_fn.master_layout", "transformer.layers.blocks.11.g.net.fn.fn.fn.net.0.weight", "transformer.layers.blocks.11.g.net.fn.fn.fn.net.0.bias", "transformer.layers.blocks.11.g.net.fn.fn.fn.net.3.weight", "transformer.layers.blocks.11.g.net.fn.fn.fn.net.3.bias", "transformer.layers.blocks.12.f.net.fn.fn.fn.to_qkv.weight", "transformer.layers.blocks.12.f.net.fn.fn.fn.to_out.0.weight", "transformer.layers.blocks.12.f.net.fn.fn.fn.to_out.0.bias", "transformer.layers.blocks.12.f.net.fn.fn.fn.attn_fn.master_layout", "transformer.layers.blocks.12.g.net.fn.fn.fn.net.0.weight", "transformer.layers.blocks.12.g.net.fn.fn.fn.net.0.bias", "transformer.layers.blocks.12.g.net.fn.fn.fn.net.3.weight", "transformer.layers.blocks.12.g.net.fn.fn.fn.net.3.bias", "transformer.layers.blocks.13.f.net.fn.fn.fn.to_qkv.weight", "transformer.layers.blocks.13.f.net.fn.fn.fn.to_out.0.weight", "transformer.layers.blocks.13.f.net.fn.fn.fn.to_out.0.bias", "transformer.layers.blocks.13.f.net.fn.fn.fn.attn_fn.master_layout", "transformer.layers.blocks.13.g.net.fn.fn.fn.net.0.weight", "transformer.layers.blocks.13.g.net.fn.fn.fn.net.0.bias", "transformer.layers.blocks.13.g.net.fn.fn.fn.net.3.weight", "transformer.layers.blocks.13.g.net.fn.fn.fn.net.3.bias", "transformer.layers.blocks.14.f.net.fn.fn.fn.to_qkv.weight", "transformer.layers.blocks.14.f.net.fn.fn.fn.to_out.0.weight", "transformer.layers.blocks.14.f.net.fn.fn.fn.to_out.0.bias", "transformer.layers.blocks.14.f.net.fn.fn.fn.attn_fn.master_layout", "transformer.layers.blocks.14.g.net.fn.fn.fn.net.0.weight", "transformer.layers.blocks.14.g.net.fn.fn.fn.net.0.bias", "transformer.layers.blocks.14.g.net.fn.fn.fn.net.3.weight", "transformer.layers.blocks.14.g.net.fn.fn.fn.net.3.bias", "transformer.layers.blocks.15.f.net.fn.fn.fn.to_qkv.weight", "transformer.layers.blocks.15.f.net.fn.fn.fn.to_out.0.weight", "transformer.layers.blocks.15.f.net.fn.fn.fn.to_out.0.bias", "transformer.layers.blocks.15.f.net.fn.fn.fn.attn_fn.master_layout", "transformer.layers.blocks.15.g.net.fn.fn.fn.net.0.weight", "transformer.layers.blocks.15.g.net.fn.fn.fn.net.0.bias", "transformer.layers.blocks.15.g.net.fn.fn.fn.net.3.weight", "transformer.layers.blocks.15.g.net.fn.fn.fn.net.3.bias".
Unexpected key(s) in state_dict: "text_pos_emb.weight", "image_pos_emb.weights_0", "image_pos_emb.weights_1", "transformer.layers.blocks.0.f.net.fn.fn.to_qkv.weight", "transformer.layers.blocks.0.f.net.fn.fn.to_out.0.weight", "transformer.layers.blocks.0.f.net.fn.fn.to_out.0.bias", "transformer.layers.blocks.0.f.net.fn.fn.attn_fn.master_layout", "transformer.layers.blocks.0.g.net.fn.fn.net.0.weight", "transformer.layers.blocks.0.g.net.fn.fn.net.0.bias", "transformer.layers.blocks.0.g.net.fn.fn.net.3.weight", "transformer.layers.blocks.0.g.net.fn.fn.net.3.bias", "transformer.layers.blocks.1.f.net.fn.fn.to_qkv.weight", "transformer.layers.blocks.1.f.net.fn.fn.to_out.0.weight", "transformer.layers.blocks.1.f.net.fn.fn.to_out.0.bias", "transformer.layers.blocks.1.f.net.fn.fn.attn_fn.master_layout", "transformer.layers.blocks.1.g.net.fn.fn.net.0.weight", "transformer.layers.blocks.1.g.net.fn.fn.net.0.bias", "transformer.layers.blocks.1.g.net.fn.fn.net.3.weight", "transformer.layers.blocks.1.g.net.fn.fn.net.3.bias", "transformer.layers.blocks.2.f.net.fn.fn.to_qkv.weight", "transformer.layers.blocks.2.f.net.fn.fn.to_out.0.weight", "transformer.layers.blocks.2.f.net.fn.fn.to_out.0.bias", "transformer.layers.blocks.2.f.net.fn.fn.attn_fn.master_layout", "transformer.layers.blocks.2.g.net.fn.fn.net.0.weight", "transformer.layers.blocks.2.g.net.fn.fn.net.0.bias", "transformer.layers.blocks.2.g.net.fn.fn.net.3.weight", "transformer.layers.blocks.2.g.net.fn.fn.net.3.bias", "transformer.layers.blocks.3.f.net.fn.fn.to_qkv.weight", "transformer.layers.blocks.3.f.net.fn.fn.to_out.0.weight", "transformer.layers.blocks.3.f.net.fn.fn.to_out.0.bias", "transformer.layers.blocks.3.f.net.fn.fn.attn_fn.master_layout", "transformer.layers.blocks.3.g.net.fn.fn.net.0.weight", "transformer.layers.blocks.3.g.net.fn.fn.net.0.bias", "transformer.layers.blocks.3.g.net.fn.fn.net.3.weight", "transformer.layers.blocks.3.g.net.fn.fn.net.3.bias", "transformer.layers.blocks.4.f.net.fn.fn.to_qkv.weight", "transformer.layers.blocks.4.f.net.fn.fn.to_out.0.weight", "transformer.layers.blocks.4.f.net.fn.fn.to_out.0.bias", "transformer.layers.blocks.4.f.net.fn.fn.attn_fn.master_layout", "transformer.layers.blocks.4.g.net.fn.fn.net.0.weight", "transformer.layers.blocks.4.g.net.fn.fn.net.0.bias", "transformer.layers.blocks.4.g.net.fn.fn.net.3.weight", "transformer.layers.blocks.4.g.net.fn.fn.net.3.bias", "transformer.layers.blocks.5.f.net.fn.fn.to_qkv.weight", "transformer.layers.blocks.5.f.net.fn.fn.to_out.0.weight", "transformer.layers.blocks.5.f.net.fn.fn.to_out.0.bias", "transformer.layers.blocks.5.f.net.fn.fn.attn_fn.master_layout", "transformer.layers.blocks.5.g.net.fn.fn.net.0.weight", "transformer.layers.blocks.5.g.net.fn.fn.net.0.bias", "transformer.layers.blocks.5.g.net.fn.fn.net.3.weight", "transformer.layers.blocks.5.g.net.fn.fn.net.3.bias", "transformer.layers.blocks.6.f.net.fn.fn.to_qkv.weight", "transformer.layers.blocks.6.f.net.fn.fn.to_out.0.weight", "transformer.layers.blocks.6.f.net.fn.fn.to_out.0.bias", "transformer.layers.blocks.6.f.net.fn.fn.attn_fn.master_layout", "transformer.layers.blocks.6.g.net.fn.fn.net.0.weight", "transformer.layers.blocks.6.g.net.fn.fn.net.0.bias", "transformer.layers.blocks.6.g.net.fn.fn.net.3.weight", "transformer.layers.blocks.6.g.net.fn.fn.net.3.bias", "transformer.layers.blocks.7.f.net.fn.fn.to_qkv.weight", "transformer.layers.blocks.7.f.net.fn.fn.to_out.0.weight", "transformer.layers.blocks.7.f.net.fn.fn.to_out.0.bias", "transformer.layers.blocks.7.f.net.fn.fn.attn_fn.master_layout", "transformer.layers.blocks.7.g.net.fn.fn.net.0.weight", "transformer.layers.blocks.7.g.net.fn.fn.net.0.bias", "transformer.layers.blocks.7.g.net.fn.fn.net.3.weight", "transformer.layers.blocks.7.g.net.fn.fn.net.3.bias", "transformer.layers.blocks.8.f.net.fn.fn.to_qkv.weight", "transformer.layers.blocks.8.f.net.fn.fn.to_out.0.weight", "transformer.layers.blocks.8.f.net.fn.fn.to_out.0.bias", "transformer.layers.blocks.8.f.net.fn.fn.attn_fn.master_layout", "transformer.layers.blocks.8.g.net.fn.fn.net.0.weight", "transformer.layers.blocks.8.g.net.fn.fn.net.0.bias", "transformer.layers.blocks.8.g.net.fn.fn.net.3.weight", "transformer.layers.blocks.8.g.net.fn.fn.net.3.bias", "transformer.layers.blocks.9.f.net.fn.fn.to_qkv.weight", "transformer.layers.blocks.9.f.net.fn.fn.to_out.0.weight", "transformer.layers.blocks.9.f.net.fn.fn.to_out.0.bias", "transformer.layers.blocks.9.f.net.fn.fn.attn_fn.master_layout", "transformer.layers.blocks.9.g.net.fn.fn.net.0.weight", "transformer.layers.blocks.9.g.net.fn.fn.net.0.bias", "transformer.layers.blocks.9.g.net.fn.fn.net.3.weight", "transformer.layers.blocks.9.g.net.fn.fn.net.3.bias", "transformer.layers.blocks.10.f.net.fn.fn.to_qkv.weight", "transformer.layers.blocks.10.f.net.fn.fn.to_out.0.weight", "transformer.layers.blocks.10.f.net.fn.fn.to_out.0.bias", "transformer.layers.blocks.10.f.net.fn.fn.attn_fn.master_layout", "transformer.layers.blocks.10.g.net.fn.fn.net.0.weight", "transformer.layers.blocks.10.g.net.fn.fn.net.0.bias", "transformer.layers.blocks.10.g.net.fn.fn.net.3.weight", "transformer.layers.blocks.10.g.net.fn.fn.net.3.bias", "transformer.layers.blocks.11.f.net.fn.fn.to_qkv.weight", "transformer.layers.blocks.11.f.net.fn.fn.to_out.0.weight", "transformer.layers.blocks.11.f.net.fn.fn.to_out.0.bias", "transformer.layers.blocks.11.f.net.fn.fn.attn_fn.master_layout", "transformer.layers.blocks.11.g.net.fn.fn.net.0.weight", "transformer.layers.blocks.11.g.net.fn.fn.net.0.bias", "transformer.layers.blocks.11.g.net.fn.fn.net.3.weight", "transformer.layers.blocks.11.g.net.fn.fn.net.3.bias", "transformer.layers.blocks.12.f.net.fn.fn.to_qkv.weight", "transformer.layers.blocks.12.f.net.fn.fn.to_out.0.weight", "transformer.layers.blocks.12.f.net.fn.fn.to_out.0.bias", "transformer.layers.blocks.12.f.net.fn.fn.attn_fn.master_layout", "transformer.layers.blocks.12.g.net.fn.fn.net.0.weight", "transformer.layers.blocks.12.g.net.fn.fn.net.0.bias", "transformer.layers.blocks.12.g.net.fn.fn.net.3.weight", "transformer.layers.blocks.12.g.net.fn.fn.net.3.bias", "transformer.layers.blocks.13.f.net.fn.fn.to_qkv.weight", "transformer.layers.blocks.13.f.net.fn.fn.to_out.0.weight", "transformer.layers.blocks.13.f.net.fn.fn.to_out.0.bias", "transformer.layers.blocks.13.f.net.fn.fn.attn_fn.master_layout", "transformer.layers.blocks.13.g.net.fn.fn.net.0.weight", "transformer.layers.blocks.13.g.net.fn.fn.net.0.bias", "transformer.layers.blocks.13.g.net.fn.fn.net.3.weight", "transformer.layers.blocks.13.g.net.fn.fn.net.3.bias", "transformer.layers.blocks.14.f.net.fn.fn.to_qkv.weight", "transformer.layers.blocks.14.f.net.fn.fn.to_out.0.weight", "transformer.layers.blocks.14.f.net.fn.fn.to_out.0.bias", "transformer.layers.blocks.14.f.net.fn.fn.attn_fn.master_layout", "transformer.layers.blocks.14.g.net.fn.fn.net.0.weight", "transformer.layers.blocks.14.g.net.fn.fn.net.0.bias", "transformer.layers.blocks.14.g.net.fn.fn.net.3.weight", "transformer.layers.blocks.14.g.net.fn.fn.net.3.bias", "transformer.layers.blocks.15.f.net.fn.fn.to_qkv.weight", "transformer.layers.blocks.15.f.net.fn.fn.to_out.0.weight", "transformer.layers.blocks.15.f.net.fn.fn.to_out.0.bias", "transformer.layers.blocks.15.f.net.fn.fn.attn_fn.master_layout", "transformer.layers.blocks.15.g.net.fn.fn.net.0.weight", "transformer.layers.blocks.15.g.net.fn.fn.net.0.bias", "transformer.layers.blocks.15.g.net.fn.fn.net.3.weight", "transformer.layers.blocks.15.g.net.fn.fn.net.3.bias".
Thank you! the colab inferencing has been actually fixed now-- needed to change the versions.
It seems the 0.14.3.zip is downloaded in the wrong path (/content/dalle-pytorch-pretrained/
instead of /content/
) so that DALLE-pytorch
directory is not created via !unzip /content/0.14.3.zip -d /content/dalle-pytorch-pretrained
You have to add -O /content/
at the end of wget command as follows:
!wget "https://github.com/lucidrains/DALLE-pytorch/archive/refs/tags/0.14.3.zip" -O /content/
thank you! actually fixed now
It seems the 0.14.3.zip is downloaded in the wrong path (
/content/dalle-pytorch-pretrained/
instead of/content/
) so thatDALLE-pytorch
directory is not created via!unzip /content/0.14.3.zip -d /content/dalle-pytorch-pretrained
You have to add-O /content/
at the end of wget command as follows:!wget "https://github.com/lucidrains/DALLE-pytorch/archive/refs/tags/0.14.3.zip" -O /content/
Thanks for sharing model! It seems like the dalle-pytorch pip version needs to be fixed in the colab example.
dalle-pytorch==1.14.3
->dalle-pytorch==0.14.3