robvanvolt / DALLE-models

Here is a collection of checkpoints for DALLE-pytorch models, from where you can keep on training or start generating images.
MIT License
147 stars 13 forks source link

Can't run inference with Colab (dalle-pytorch version / state dict keys mismatch) #12

Open j-min opened 3 years ago

j-min commented 3 years ago

Thanks for sharing model! It seems like the dalle-pytorch pip version needs to be fixed in the colab example.

dalle-pytorch==1.14.3-> dalle-pytorch==0.14.3

image

j-min commented 3 years ago

And it seems like the current auto-downloaded checkpoint is not compatible with the current colab code. I originally thought it was due to the version mismatch above, but maybe there's another issue. Would you please check? Below I attach the error log when running this cell. image

2021-08-25 22:36:53.220519: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
Working with z of shape (1, 256, 16, 16) = 65536 dimensions.
Downloading: "https://download.pytorch.org/models/vgg16-397923af.pth" to /root/.cache/torch/hub/checkpoints/vgg16-397923af.pth
100% 528M/528M [00:03<00:00, 177MB/s]
Downloading vgg_lpips model from https://heibox.uni-heidelberg.de/f/607503859c864bc1b30b/?dl=1 to taming/modules/autoencoder/lpips/vgg.pth
8.19kB [00:00, 354kB/s]        
loaded pretrained LPIPS loss from taming/modules/autoencoder/lpips/vgg.pth
VQLPIPSWithDiscriminator running with hinge loss.
Loaded VQGAN from /root/.cache/dalle/vqgan.1024.model.ckpt and /root/.cache/dalle/vqgan.1024.config.yml
Traceback (most recent call last):
  File "/content/dalle-pytorch-pretrained/dalle-pytorch-pretrained/DALLE-pytorch/generate.py", line 96, in <module>
    dalle.load_state_dict(weights)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1407, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for DALLE:
    Missing key(s) in state_dict: "transformer.pos_emb", "transformer.layers.blocks.0.f.net.fn.fn.fn.to_qkv.weight", "transformer.layers.blocks.0.f.net.fn.fn.fn.to_out.0.weight", "transformer.layers.blocks.0.f.net.fn.fn.fn.to_out.0.bias", "transformer.layers.blocks.0.f.net.fn.fn.fn.attn_fn.master_layout", "transformer.layers.blocks.0.g.net.fn.fn.fn.net.0.weight", "transformer.layers.blocks.0.g.net.fn.fn.fn.net.0.bias", "transformer.layers.blocks.0.g.net.fn.fn.fn.net.3.weight", "transformer.layers.blocks.0.g.net.fn.fn.fn.net.3.bias", "transformer.layers.blocks.1.f.net.fn.fn.fn.to_qkv.weight", "transformer.layers.blocks.1.f.net.fn.fn.fn.to_out.0.weight", "transformer.layers.blocks.1.f.net.fn.fn.fn.to_out.0.bias", "transformer.layers.blocks.1.f.net.fn.fn.fn.attn_fn.master_layout", "transformer.layers.blocks.1.g.net.fn.fn.fn.net.0.weight", "transformer.layers.blocks.1.g.net.fn.fn.fn.net.0.bias", "transformer.layers.blocks.1.g.net.fn.fn.fn.net.3.weight", "transformer.layers.blocks.1.g.net.fn.fn.fn.net.3.bias", "transformer.layers.blocks.2.f.net.fn.fn.fn.to_qkv.weight", "transformer.layers.blocks.2.f.net.fn.fn.fn.to_out.0.weight", "transformer.layers.blocks.2.f.net.fn.fn.fn.to_out.0.bias", "transformer.layers.blocks.2.f.net.fn.fn.fn.attn_fn.master_layout", "transformer.layers.blocks.2.g.net.fn.fn.fn.net.0.weight", "transformer.layers.blocks.2.g.net.fn.fn.fn.net.0.bias", "transformer.layers.blocks.2.g.net.fn.fn.fn.net.3.weight", "transformer.layers.blocks.2.g.net.fn.fn.fn.net.3.bias", "transformer.layers.blocks.3.f.net.fn.fn.fn.to_qkv.weight", "transformer.layers.blocks.3.f.net.fn.fn.fn.to_out.0.weight", "transformer.layers.blocks.3.f.net.fn.fn.fn.to_out.0.bias", "transformer.layers.blocks.3.f.net.fn.fn.fn.attn_fn.master_layout", "transformer.layers.blocks.3.g.net.fn.fn.fn.net.0.weight", "transformer.layers.blocks.3.g.net.fn.fn.fn.net.0.bias", "transformer.layers.blocks.3.g.net.fn.fn.fn.net.3.weight", "transformer.layers.blocks.3.g.net.fn.fn.fn.net.3.bias", "transformer.layers.blocks.4.f.net.fn.fn.fn.to_qkv.weight", "transformer.layers.blocks.4.f.net.fn.fn.fn.to_out.0.weight", "transformer.layers.blocks.4.f.net.fn.fn.fn.to_out.0.bias", "transformer.layers.blocks.4.f.net.fn.fn.fn.attn_fn.master_layout", "transformer.layers.blocks.4.g.net.fn.fn.fn.net.0.weight", "transformer.layers.blocks.4.g.net.fn.fn.fn.net.0.bias", "transformer.layers.blocks.4.g.net.fn.fn.fn.net.3.weight", "transformer.layers.blocks.4.g.net.fn.fn.fn.net.3.bias", "transformer.layers.blocks.5.f.net.fn.fn.fn.to_qkv.weight", "transformer.layers.blocks.5.f.net.fn.fn.fn.to_out.0.weight", "transformer.layers.blocks.5.f.net.fn.fn.fn.to_out.0.bias", "transformer.layers.blocks.5.f.net.fn.fn.fn.attn_fn.master_layout", "transformer.layers.blocks.5.g.net.fn.fn.fn.net.0.weight", "transformer.layers.blocks.5.g.net.fn.fn.fn.net.0.bias", "transformer.layers.blocks.5.g.net.fn.fn.fn.net.3.weight", "transformer.layers.blocks.5.g.net.fn.fn.fn.net.3.bias", "transformer.layers.blocks.6.f.net.fn.fn.fn.to_qkv.weight", "transformer.layers.blocks.6.f.net.fn.fn.fn.to_out.0.weight", "transformer.layers.blocks.6.f.net.fn.fn.fn.to_out.0.bias", "transformer.layers.blocks.6.f.net.fn.fn.fn.attn_fn.master_layout", "transformer.layers.blocks.6.g.net.fn.fn.fn.net.0.weight", "transformer.layers.blocks.6.g.net.fn.fn.fn.net.0.bias", "transformer.layers.blocks.6.g.net.fn.fn.fn.net.3.weight", "transformer.layers.blocks.6.g.net.fn.fn.fn.net.3.bias", "transformer.layers.blocks.7.f.net.fn.fn.fn.to_qkv.weight", "transformer.layers.blocks.7.f.net.fn.fn.fn.to_out.0.weight", "transformer.layers.blocks.7.f.net.fn.fn.fn.to_out.0.bias", "transformer.layers.blocks.7.f.net.fn.fn.fn.attn_fn.master_layout", "transformer.layers.blocks.7.g.net.fn.fn.fn.net.0.weight", "transformer.layers.blocks.7.g.net.fn.fn.fn.net.0.bias", "transformer.layers.blocks.7.g.net.fn.fn.fn.net.3.weight", "transformer.layers.blocks.7.g.net.fn.fn.fn.net.3.bias", "transformer.layers.blocks.8.f.net.fn.fn.fn.to_qkv.weight", "transformer.layers.blocks.8.f.net.fn.fn.fn.to_out.0.weight", "transformer.layers.blocks.8.f.net.fn.fn.fn.to_out.0.bias", "transformer.layers.blocks.8.f.net.fn.fn.fn.attn_fn.master_layout", "transformer.layers.blocks.8.g.net.fn.fn.fn.net.0.weight", "transformer.layers.blocks.8.g.net.fn.fn.fn.net.0.bias", "transformer.layers.blocks.8.g.net.fn.fn.fn.net.3.weight", "transformer.layers.blocks.8.g.net.fn.fn.fn.net.3.bias", "transformer.layers.blocks.9.f.net.fn.fn.fn.to_qkv.weight", "transformer.layers.blocks.9.f.net.fn.fn.fn.to_out.0.weight", "transformer.layers.blocks.9.f.net.fn.fn.fn.to_out.0.bias", "transformer.layers.blocks.9.f.net.fn.fn.fn.attn_fn.master_layout", "transformer.layers.blocks.9.g.net.fn.fn.fn.net.0.weight", "transformer.layers.blocks.9.g.net.fn.fn.fn.net.0.bias", "transformer.layers.blocks.9.g.net.fn.fn.fn.net.3.weight", "transformer.layers.blocks.9.g.net.fn.fn.fn.net.3.bias", "transformer.layers.blocks.10.f.net.fn.fn.fn.to_qkv.weight", "transformer.layers.blocks.10.f.net.fn.fn.fn.to_out.0.weight", "transformer.layers.blocks.10.f.net.fn.fn.fn.to_out.0.bias", "transformer.layers.blocks.10.f.net.fn.fn.fn.attn_fn.master_layout", "transformer.layers.blocks.10.g.net.fn.fn.fn.net.0.weight", "transformer.layers.blocks.10.g.net.fn.fn.fn.net.0.bias", "transformer.layers.blocks.10.g.net.fn.fn.fn.net.3.weight", "transformer.layers.blocks.10.g.net.fn.fn.fn.net.3.bias", "transformer.layers.blocks.11.f.net.fn.fn.fn.to_qkv.weight", "transformer.layers.blocks.11.f.net.fn.fn.fn.to_out.0.weight", "transformer.layers.blocks.11.f.net.fn.fn.fn.to_out.0.bias", "transformer.layers.blocks.11.f.net.fn.fn.fn.attn_fn.master_layout", "transformer.layers.blocks.11.g.net.fn.fn.fn.net.0.weight", "transformer.layers.blocks.11.g.net.fn.fn.fn.net.0.bias", "transformer.layers.blocks.11.g.net.fn.fn.fn.net.3.weight", "transformer.layers.blocks.11.g.net.fn.fn.fn.net.3.bias", "transformer.layers.blocks.12.f.net.fn.fn.fn.to_qkv.weight", "transformer.layers.blocks.12.f.net.fn.fn.fn.to_out.0.weight", "transformer.layers.blocks.12.f.net.fn.fn.fn.to_out.0.bias", "transformer.layers.blocks.12.f.net.fn.fn.fn.attn_fn.master_layout", "transformer.layers.blocks.12.g.net.fn.fn.fn.net.0.weight", "transformer.layers.blocks.12.g.net.fn.fn.fn.net.0.bias", "transformer.layers.blocks.12.g.net.fn.fn.fn.net.3.weight", "transformer.layers.blocks.12.g.net.fn.fn.fn.net.3.bias", "transformer.layers.blocks.13.f.net.fn.fn.fn.to_qkv.weight", "transformer.layers.blocks.13.f.net.fn.fn.fn.to_out.0.weight", "transformer.layers.blocks.13.f.net.fn.fn.fn.to_out.0.bias", "transformer.layers.blocks.13.f.net.fn.fn.fn.attn_fn.master_layout", "transformer.layers.blocks.13.g.net.fn.fn.fn.net.0.weight", "transformer.layers.blocks.13.g.net.fn.fn.fn.net.0.bias", "transformer.layers.blocks.13.g.net.fn.fn.fn.net.3.weight", "transformer.layers.blocks.13.g.net.fn.fn.fn.net.3.bias", "transformer.layers.blocks.14.f.net.fn.fn.fn.to_qkv.weight", "transformer.layers.blocks.14.f.net.fn.fn.fn.to_out.0.weight", "transformer.layers.blocks.14.f.net.fn.fn.fn.to_out.0.bias", "transformer.layers.blocks.14.f.net.fn.fn.fn.attn_fn.master_layout", "transformer.layers.blocks.14.g.net.fn.fn.fn.net.0.weight", "transformer.layers.blocks.14.g.net.fn.fn.fn.net.0.bias", "transformer.layers.blocks.14.g.net.fn.fn.fn.net.3.weight", "transformer.layers.blocks.14.g.net.fn.fn.fn.net.3.bias", "transformer.layers.blocks.15.f.net.fn.fn.fn.to_qkv.weight", "transformer.layers.blocks.15.f.net.fn.fn.fn.to_out.0.weight", "transformer.layers.blocks.15.f.net.fn.fn.fn.to_out.0.bias", "transformer.layers.blocks.15.f.net.fn.fn.fn.attn_fn.master_layout", "transformer.layers.blocks.15.g.net.fn.fn.fn.net.0.weight", "transformer.layers.blocks.15.g.net.fn.fn.fn.net.0.bias", "transformer.layers.blocks.15.g.net.fn.fn.fn.net.3.weight", "transformer.layers.blocks.15.g.net.fn.fn.fn.net.3.bias". 
    Unexpected key(s) in state_dict: "text_pos_emb.weight", "image_pos_emb.weights_0", "image_pos_emb.weights_1", "transformer.layers.blocks.0.f.net.fn.fn.to_qkv.weight", "transformer.layers.blocks.0.f.net.fn.fn.to_out.0.weight", "transformer.layers.blocks.0.f.net.fn.fn.to_out.0.bias", "transformer.layers.blocks.0.f.net.fn.fn.attn_fn.master_layout", "transformer.layers.blocks.0.g.net.fn.fn.net.0.weight", "transformer.layers.blocks.0.g.net.fn.fn.net.0.bias", "transformer.layers.blocks.0.g.net.fn.fn.net.3.weight", "transformer.layers.blocks.0.g.net.fn.fn.net.3.bias", "transformer.layers.blocks.1.f.net.fn.fn.to_qkv.weight", "transformer.layers.blocks.1.f.net.fn.fn.to_out.0.weight", "transformer.layers.blocks.1.f.net.fn.fn.to_out.0.bias", "transformer.layers.blocks.1.f.net.fn.fn.attn_fn.master_layout", "transformer.layers.blocks.1.g.net.fn.fn.net.0.weight", "transformer.layers.blocks.1.g.net.fn.fn.net.0.bias", "transformer.layers.blocks.1.g.net.fn.fn.net.3.weight", "transformer.layers.blocks.1.g.net.fn.fn.net.3.bias", "transformer.layers.blocks.2.f.net.fn.fn.to_qkv.weight", "transformer.layers.blocks.2.f.net.fn.fn.to_out.0.weight", "transformer.layers.blocks.2.f.net.fn.fn.to_out.0.bias", "transformer.layers.blocks.2.f.net.fn.fn.attn_fn.master_layout", "transformer.layers.blocks.2.g.net.fn.fn.net.0.weight", "transformer.layers.blocks.2.g.net.fn.fn.net.0.bias", "transformer.layers.blocks.2.g.net.fn.fn.net.3.weight", "transformer.layers.blocks.2.g.net.fn.fn.net.3.bias", "transformer.layers.blocks.3.f.net.fn.fn.to_qkv.weight", "transformer.layers.blocks.3.f.net.fn.fn.to_out.0.weight", "transformer.layers.blocks.3.f.net.fn.fn.to_out.0.bias", "transformer.layers.blocks.3.f.net.fn.fn.attn_fn.master_layout", "transformer.layers.blocks.3.g.net.fn.fn.net.0.weight", "transformer.layers.blocks.3.g.net.fn.fn.net.0.bias", "transformer.layers.blocks.3.g.net.fn.fn.net.3.weight", "transformer.layers.blocks.3.g.net.fn.fn.net.3.bias", "transformer.layers.blocks.4.f.net.fn.fn.to_qkv.weight", "transformer.layers.blocks.4.f.net.fn.fn.to_out.0.weight", "transformer.layers.blocks.4.f.net.fn.fn.to_out.0.bias", "transformer.layers.blocks.4.f.net.fn.fn.attn_fn.master_layout", "transformer.layers.blocks.4.g.net.fn.fn.net.0.weight", "transformer.layers.blocks.4.g.net.fn.fn.net.0.bias", "transformer.layers.blocks.4.g.net.fn.fn.net.3.weight", "transformer.layers.blocks.4.g.net.fn.fn.net.3.bias", "transformer.layers.blocks.5.f.net.fn.fn.to_qkv.weight", "transformer.layers.blocks.5.f.net.fn.fn.to_out.0.weight", "transformer.layers.blocks.5.f.net.fn.fn.to_out.0.bias", "transformer.layers.blocks.5.f.net.fn.fn.attn_fn.master_layout", "transformer.layers.blocks.5.g.net.fn.fn.net.0.weight", "transformer.layers.blocks.5.g.net.fn.fn.net.0.bias", "transformer.layers.blocks.5.g.net.fn.fn.net.3.weight", "transformer.layers.blocks.5.g.net.fn.fn.net.3.bias", "transformer.layers.blocks.6.f.net.fn.fn.to_qkv.weight", "transformer.layers.blocks.6.f.net.fn.fn.to_out.0.weight", "transformer.layers.blocks.6.f.net.fn.fn.to_out.0.bias", "transformer.layers.blocks.6.f.net.fn.fn.attn_fn.master_layout", "transformer.layers.blocks.6.g.net.fn.fn.net.0.weight", "transformer.layers.blocks.6.g.net.fn.fn.net.0.bias", "transformer.layers.blocks.6.g.net.fn.fn.net.3.weight", "transformer.layers.blocks.6.g.net.fn.fn.net.3.bias", "transformer.layers.blocks.7.f.net.fn.fn.to_qkv.weight", "transformer.layers.blocks.7.f.net.fn.fn.to_out.0.weight", "transformer.layers.blocks.7.f.net.fn.fn.to_out.0.bias", "transformer.layers.blocks.7.f.net.fn.fn.attn_fn.master_layout", "transformer.layers.blocks.7.g.net.fn.fn.net.0.weight", "transformer.layers.blocks.7.g.net.fn.fn.net.0.bias", "transformer.layers.blocks.7.g.net.fn.fn.net.3.weight", "transformer.layers.blocks.7.g.net.fn.fn.net.3.bias", "transformer.layers.blocks.8.f.net.fn.fn.to_qkv.weight", "transformer.layers.blocks.8.f.net.fn.fn.to_out.0.weight", "transformer.layers.blocks.8.f.net.fn.fn.to_out.0.bias", "transformer.layers.blocks.8.f.net.fn.fn.attn_fn.master_layout", "transformer.layers.blocks.8.g.net.fn.fn.net.0.weight", "transformer.layers.blocks.8.g.net.fn.fn.net.0.bias", "transformer.layers.blocks.8.g.net.fn.fn.net.3.weight", "transformer.layers.blocks.8.g.net.fn.fn.net.3.bias", "transformer.layers.blocks.9.f.net.fn.fn.to_qkv.weight", "transformer.layers.blocks.9.f.net.fn.fn.to_out.0.weight", "transformer.layers.blocks.9.f.net.fn.fn.to_out.0.bias", "transformer.layers.blocks.9.f.net.fn.fn.attn_fn.master_layout", "transformer.layers.blocks.9.g.net.fn.fn.net.0.weight", "transformer.layers.blocks.9.g.net.fn.fn.net.0.bias", "transformer.layers.blocks.9.g.net.fn.fn.net.3.weight", "transformer.layers.blocks.9.g.net.fn.fn.net.3.bias", "transformer.layers.blocks.10.f.net.fn.fn.to_qkv.weight", "transformer.layers.blocks.10.f.net.fn.fn.to_out.0.weight", "transformer.layers.blocks.10.f.net.fn.fn.to_out.0.bias", "transformer.layers.blocks.10.f.net.fn.fn.attn_fn.master_layout", "transformer.layers.blocks.10.g.net.fn.fn.net.0.weight", "transformer.layers.blocks.10.g.net.fn.fn.net.0.bias", "transformer.layers.blocks.10.g.net.fn.fn.net.3.weight", "transformer.layers.blocks.10.g.net.fn.fn.net.3.bias", "transformer.layers.blocks.11.f.net.fn.fn.to_qkv.weight", "transformer.layers.blocks.11.f.net.fn.fn.to_out.0.weight", "transformer.layers.blocks.11.f.net.fn.fn.to_out.0.bias", "transformer.layers.blocks.11.f.net.fn.fn.attn_fn.master_layout", "transformer.layers.blocks.11.g.net.fn.fn.net.0.weight", "transformer.layers.blocks.11.g.net.fn.fn.net.0.bias", "transformer.layers.blocks.11.g.net.fn.fn.net.3.weight", "transformer.layers.blocks.11.g.net.fn.fn.net.3.bias", "transformer.layers.blocks.12.f.net.fn.fn.to_qkv.weight", "transformer.layers.blocks.12.f.net.fn.fn.to_out.0.weight", "transformer.layers.blocks.12.f.net.fn.fn.to_out.0.bias", "transformer.layers.blocks.12.f.net.fn.fn.attn_fn.master_layout", "transformer.layers.blocks.12.g.net.fn.fn.net.0.weight", "transformer.layers.blocks.12.g.net.fn.fn.net.0.bias", "transformer.layers.blocks.12.g.net.fn.fn.net.3.weight", "transformer.layers.blocks.12.g.net.fn.fn.net.3.bias", "transformer.layers.blocks.13.f.net.fn.fn.to_qkv.weight", "transformer.layers.blocks.13.f.net.fn.fn.to_out.0.weight", "transformer.layers.blocks.13.f.net.fn.fn.to_out.0.bias", "transformer.layers.blocks.13.f.net.fn.fn.attn_fn.master_layout", "transformer.layers.blocks.13.g.net.fn.fn.net.0.weight", "transformer.layers.blocks.13.g.net.fn.fn.net.0.bias", "transformer.layers.blocks.13.g.net.fn.fn.net.3.weight", "transformer.layers.blocks.13.g.net.fn.fn.net.3.bias", "transformer.layers.blocks.14.f.net.fn.fn.to_qkv.weight", "transformer.layers.blocks.14.f.net.fn.fn.to_out.0.weight", "transformer.layers.blocks.14.f.net.fn.fn.to_out.0.bias", "transformer.layers.blocks.14.f.net.fn.fn.attn_fn.master_layout", "transformer.layers.blocks.14.g.net.fn.fn.net.0.weight", "transformer.layers.blocks.14.g.net.fn.fn.net.0.bias", "transformer.layers.blocks.14.g.net.fn.fn.net.3.weight", "transformer.layers.blocks.14.g.net.fn.fn.net.3.bias", "transformer.layers.blocks.15.f.net.fn.fn.to_qkv.weight", "transformer.layers.blocks.15.f.net.fn.fn.to_out.0.weight", "transformer.layers.blocks.15.f.net.fn.fn.to_out.0.bias", "transformer.layers.blocks.15.f.net.fn.fn.attn_fn.master_layout", "transformer.layers.blocks.15.g.net.fn.fn.net.0.weight", "transformer.layers.blocks.15.g.net.fn.fn.net.0.bias", "transformer.layers.blocks.15.g.net.fn.fn.net.3.weight", "transformer.layers.blocks.15.g.net.fn.fn.net.3.bias". 
johnpaulbin commented 3 years ago

Thank you! the colab inferencing has been actually fixed now-- needed to change the versions.

j-min commented 3 years ago

It seems the 0.14.3.zip is downloaded in the wrong path (/content/dalle-pytorch-pretrained/ instead of /content/) so that DALLE-pytorch directory is not created via !unzip /content/0.14.3.zip -d /content/dalle-pytorch-pretrained You have to add -O /content/ at the end of wget command as follows: !wget "https://github.com/lucidrains/DALLE-pytorch/archive/refs/tags/0.14.3.zip" -O /content/

image

johnpaulbin commented 3 years ago

thank you! actually fixed now

It seems the 0.14.3.zip is downloaded in the wrong path (/content/dalle-pytorch-pretrained/ instead of /content/) so that DALLE-pytorch directory is not created via !unzip /content/0.14.3.zip -d /content/dalle-pytorch-pretrained You have to add -O /content/ at the end of wget command as follows: !wget "https://github.com/lucidrains/DALLE-pytorch/archive/refs/tags/0.14.3.zip" -O /content/

image