openai / guided-diffusion

MIT License
6.06k stars 807 forks source link

Error when loading pretrained weights #7

Closed mehdizemni closed 3 years ago

mehdizemni commented 3 years ago

Thank you for releasing pretrained weights. I tried to use some of your pretrained weights as you described in the readme but there is a mismatch between checkpoint weights and the model.

Logging to /tmp/openai-2021-07-22-14-28-52-986510 creating model and diffusion... Traceback (most recent call last): File "scripts/image_sample.py", line 108, in main() File "scripts/image_sample.py", line 33, in main model.load_state_dict( File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1223, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for UNetModel: Missing key(s) in state_dict: "input_blocks.3.0.op.weight", "input_blocks.3.0.op.bias", "input_blocks.4.0.skip_connection.weight", "input_blocks.4.0.skip_connection.bias", "input_blocks.6.0.op.weight", "input_blocks.6.0.op.bias", "input_blocks.7.1.norm.weight", "input_blocks.7.1.norm.bias", "input_blocks.7.1.qkv.weight", "input_blocks.7.1.qkv.bias", "input_blocks.7.1.proj_out.weight", "input_blocks.7.1.proj_out.bias", "input_blocks.8.1.norm.weight", "input_blocks.8.1.norm.bias", "input_blocks.8.1.qkv.weight", "input_blocks.8.1.qkv.bias", "input_blocks.8.1.proj_out.weight", "input_blocks.8.1.proj_out.bias", "input_blocks.9.0.op.weight", "input_blocks.9.0.op.bias", "input_blocks.10.0.skip_connection.weight", "input_blocks.10.0.skip_connection.bias", "output_blocks.2.2.conv.weight", "output_blocks.2.2.conv.bias", "output_blocks.5.2.conv.weight", "output_blocks.5.2.conv.bias", "output_blocks.8.1.conv.weight", "output_blocks.8.1.conv.bias". Unexpected key(s) in state_dict: "input_blocks.12.0.in_layers.0.weight", "input_blocks.12.0.in_layers.0.bias", "input_blocks.12.0.in_layers.2.weight", "input_blocks.12.0.in_layers.2.bias", "input_blocks.12.0.emb_layers.1.weight", "input_blocks.12.0.emb_layers.1.bias", "input_blocks.12.0.out_layers.0.weight", "input_blocks.12.0.out_layers.0.bias", "input_blocks.12.0.out_layers.3.weight", "input_blocks.12.0.out_layers.3.bias", "input_blocks.13.0.in_layers.0.weight", "input_blocks.13.0.in_layers.0.bias", "input_blocks.13.0.in_layers.2.weight", "input_blocks.13.0.in_layers.2.bias", "input_blocks.13.0.emb_layers.1.weight", "input_blocks.13.0.emb_layers.1.bias", "input_blocks.13.0.out_layers.0.weight", "input_blocks.13.0.out_layers.0.bias", "input_blocks.13.0.out_layers.3.weight", "input_blocks.13.0.out_layers.3.bias", "input_blocks.13.0.skip_connection.weight", "input_blocks.13.0.skip_connection.bias", "input_blocks.13.1.norm.weight", "input_blocks.13.1.norm.bias", "input_blocks.13.1.qkv.weight", "input_blocks.13.1.qkv.bias", "input_blocks.13.1.proj_out.weight", "input_blocks.13.1.proj_out.bias", "input_blocks.14.0.in_layers.0.weight", "input_blocks.14.0.in_layers.0.bias", "input_blocks.14.0.in_layers.2.weight", "input_blocks.14.0.in_layers.2.bias", "input_blocks.14.0.emb_layers.1.weight", "input_blocks.14.0.emb_layers.1.bias", "input_blocks.14.0.out_layers.0.weight", "input_blocks.14.0.out_layers.0.bias", "input_blocks.14.0.out_layers.3.weight", "input_blocks.14.0.out_layers.3.bias", "input_blocks.14.1.norm.weight", "input_blocks.14.1.norm.bias", "input_blocks.14.1.qkv.weight", "input_blocks.14.1.qkv.bias", "input_blocks.14.1.proj_out.weight", "input_blocks.14.1.proj_out.bias", "input_blocks.15.0.in_layers.0.weight", "input_blocks.15.0.in_layers.0.bias", "input_blocks.15.0.in_layers.2.weight", "input_blocks.15.0.in_layers.2.bias", "input_blocks.15.0.emb_layers.1.weight", "input_blocks.15.0.emb_layers.1.bias", "input_blocks.15.0.out_layers.0.weight", "input_blocks.15.0.out_layers.0.bias", "input_blocks.15.0.out_layers.3.weight", "input_blocks.15.0.out_layers.3.bias", "input_blocks.16.0.in_layers.0.weight", "input_blocks.16.0.in_layers.0.bias", "input_blocks.16.0.in_layers.2.weight", "input_blocks.16.0.in_layers.2.bias", "input_blocks.16.0.emb_layers.1.weight", "input_blocks.16.0.emb_layers.1.bias", "input_blocks.16.0.out_layers.0.weight", "input_blocks.16.0.out_layers.0.bias", "input_blocks.16.0.out_layers.3.weight", "input_blocks.16.0.out_layers.3.bias", "input_blocks.16.1.norm.weight", "input_blocks.16.1.norm.bias", "input_blocks.16.1.qkv.weight", "input_blocks.16.1.qkv.bias", "input_blocks.16.1.proj_out.weight", "input_blocks.16.1.proj_out.bias", "input_blocks.17.0.in_layers.0.weight", "input_blocks.17.0.in_layers.0.bias", "input_blocks.17.0.in_layers.2.weight", "input_blocks.17.0.in_layers.2.bias", "input_blocks.17.0.emb_layers.1.weight", "input_blocks.17.0.emb_layers.1.bias", "input_blocks.17.0.out_layers.0.weight", "input_blocks.17.0.out_layers.0.bias", "input_blocks.17.0.out_layers.3.weight", "input_blocks.17.0.out_layers.3.bias", "input_blocks.17.1.norm.weight", "input_blocks.17.1.norm.bias", "input_blocks.17.1.qkv.weight", "input_blocks.17.1.qkv.bias", "input_blocks.17.1.proj_out.weight", "input_blocks.17.1.proj_out.bias", "input_blocks.3.0.in_layers.0.weight", "input_blocks.3.0.in_layers.0.bias", "input_blocks.3.0.in_layers.2.weight", "input_blocks.3.0.in_layers.2.bias", "input_blocks.3.0.emb_layers.1.weight", "input_blocks.3.0.emb_layers.1.bias", "input_blocks.3.0.out_layers.0.weight", "input_blocks.3.0.out_layers.0.bias", "input_blocks.3.0.out_layers.3.weight", "input_blocks.3.0.out_layers.3.bias", "input_blocks.6.0.in_layers.0.weight", "input_blocks.6.0.in_layers.0.bias", "input_blocks.6.0.in_layers.2.weight", "input_blocks.6.0.in_layers.2.bias", "input_blocks.6.0.emb_layers.1.weight", "input_blocks.6.0.emb_layers.1.bias", "input_blocks.6.0.out_layers.0.weight", "input_blocks.6.0.out_layers.0.bias", "input_blocks.6.0.out_layers.3.weight", "input_blocks.6.0.out_layers.3.bias", "input_blocks.9.0.in_layers.0.weight", "input_blocks.9.0.in_layers.0.bias", "input_blocks.9.0.in_layers.2.weight", "input_blocks.9.0.in_layers.2.bias", "input_blocks.9.0.emb_layers.1.weight", "input_blocks.9.0.emb_layers.1.bias", "input_blocks.9.0.out_layers.0.weight", "input_blocks.9.0.out_layers.0.bias", "input_blocks.9.0.out_layers.3.weight", "input_blocks.9.0.out_layers.3.bias", "output_blocks.12.0.in_layers.0.weight", "output_blocks.12.0.in_layers.0.bias", "output_blocks.12.0.in_layers.2.weight", "output_blocks.12.0.in_layers.2.bias", "output_blocks.12.0.emb_layers.1.weight", "output_blocks.12.0.emb_layers.1.bias", "output_blocks.12.0.out_layers.0.weight", "output_blocks.12.0.out_layers.0.bias", "output_blocks.12.0.out_layers.3.weight", "output_blocks.12.0.out_layers.3.bias", "output_blocks.12.0.skip_connection.weight", "output_blocks.12.0.skip_connection.bias", "output_blocks.13.0.in_layers.0.weight", "output_blocks.13.0.in_layers.0.bias", "output_blocks.13.0.in_layers.2.weight", "output_blocks.13.0.in_layers.2.bias", "output_blocks.13.0.emb_layers.1.weight", "output_blocks.13.0.emb_layers.1.bias", "output_blocks.13.0.out_layers.0.weight", "output_blocks.13.0.out_layers.0.bias", "output_blocks.13.0.out_layers.3.weight", "output_blocks.13.0.out_layers.3.bias", "output_blocks.13.0.skip_connection.weight", "output_blocks.13.0.skip_connection.bias", "output_blocks.14.0.in_layers.0.weight", "output_blocks.14.0.in_layers.0.bias", "output_blocks.14.0.in_layers.2.weight", "output_blocks.14.0.in_layers.2.bias", "output_blocks.14.0.emb_layers.1.weight", "output_blocks.14.0.emb_layers.1.bias", "output_blocks.14.0.out_layers.0.weight", "output_blocks.14.0.out_layers.0.bias", "output_blocks.14.0.out_layers.3.weight", "output_blocks.14.0.out_layers.3.bias", "output_blocks.14.0.skip_connection.weight", "output_blocks.14.0.skip_connection.bias", "output_blocks.14.1.in_layers.0.weight", "output_blocks.14.1.in_layers.0.bias", "output_blocks.14.1.in_layers.2.weight", "output_blocks.14.1.in_layers.2.bias", "output_blocks.14.1.emb_layers.1.weight", "output_blocks.14.1.emb_layers.1.bias", "output_blocks.14.1.out_layers.0.weight", "output_blocks.14.1.out_layers.0.bias", "output_blocks.14.1.out_layers.3.weight", "output_blocks.14.1.out_layers.3.bias", "output_blocks.15.0.in_layers.0.weight", "output_blocks.15.0.in_layers.0.bias", "output_blocks.15.0.in_layers.2.weight", "output_blocks.15.0.in_layers.2.bias", "output_blocks.15.0.emb_layers.1.weight", "output_blocks.15.0.emb_layers.1.bias", "output_blocks.15.0.out_layers.0.weight", "output_blocks.15.0.out_layers.0.bias", "output_blocks.15.0.out_layers.3.weight", "output_blocks.15.0.out_layers.3.bias", "output_blocks.15.0.skip_connection.weight", "output_blocks.15.0.skip_connection.bias", "output_blocks.16.0.in_layers.0.weight", "output_blocks.16.0.in_layers.0.bias", "output_blocks.16.0.in_layers.2.weight", "output_blocks.16.0.in_layers.2.bias", "output_blocks.16.0.emb_layers.1.weight", "output_blocks.16.0.emb_layers.1.bias", "output_blocks.16.0.out_layers.0.weight", "output_blocks.16.0.out_layers.0.bias", "output_blocks.16.0.out_layers.3.weight", "output_blocks.16.0.out_layers.3.bias", "output_blocks.16.0.skip_connection.weight", "output_blocks.16.0.skip_connection.bias", "output_blocks.17.0.in_layers.0.weight", "output_blocks.17.0.in_layers.0.bias", "output_blocks.17.0.in_layers.2.weight", "output_blocks.17.0.in_layers.2.bias", "output_blocks.17.0.emb_layers.1.weight", "output_blocks.17.0.emb_layers.1.bias", "output_blocks.17.0.out_layers.0.weight", "output_blocks.17.0.out_layers.0.bias", "output_blocks.17.0.out_layers.3.weight", "output_blocks.17.0.out_layers.3.bias", "output_blocks.17.0.skip_connection.weight", "output_blocks.17.0.skip_connection.bias", "output_blocks.2.2.in_layers.0.weight", "output_blocks.2.2.in_layers.0.bias", "output_blocks.2.2.in_layers.2.weight", "output_blocks.2.2.in_layers.2.bias", "output_blocks.2.2.emb_layers.1.weight", "output_blocks.2.2.emb_layers.1.bias", "output_blocks.2.2.out_layers.0.weight", "output_blocks.2.2.out_layers.0.bias", "output_blocks.2.2.out_layers.3.weight", "output_blocks.2.2.out_layers.3.bias", "output_blocks.5.2.in_layers.0.weight", "output_blocks.5.2.in_layers.0.bias", "output_blocks.5.2.in_layers.2.weight", "output_blocks.5.2.in_layers.2.bias", "output_blocks.5.2.emb_layers.1.weight", "output_blocks.5.2.emb_layers.1.bias", "output_blocks.5.2.out_layers.0.weight", "output_blocks.5.2.out_layers.0.bias", "output_blocks.5.2.out_layers.3.weight", "output_blocks.5.2.out_layers.3.bias", "output_blocks.6.1.norm.weight", "output_blocks.6.1.norm.bias", "output_blocks.6.1.qkv.weight", "output_blocks.6.1.qkv.bias", "output_blocks.6.1.proj_out.weight", "output_blocks.6.1.proj_out.bias", "output_blocks.7.1.norm.weight", "output_blocks.7.1.norm.bias", "output_blocks.7.1.qkv.weight", "output_blocks.7.1.qkv.bias", "output_blocks.7.1.proj_out.weight", "output_blocks.7.1.proj_out.bias", "output_blocks.8.2.in_layers.0.weight", "output_blocks.8.2.in_layers.0.bias", "output_blocks.8.2.in_layers.2.weight", "output_blocks.8.2.in_layers.2.bias", "output_blocks.8.2.emb_layers.1.weight", "output_blocks.8.2.emb_layers.1.bias", "output_blocks.8.2.out_layers.0.weight", "output_blocks.8.2.out_layers.0.bias", "output_blocks.8.2.out_layers.3.weight", "output_blocks.8.2.out_layers.3.bias", "output_blocks.8.1.norm.weight", "output_blocks.8.1.norm.bias", "output_blocks.8.1.qkv.weight", "output_blocks.8.1.qkv.bias", "output_blocks.8.1.proj_out.weight", "output_blocks.8.1.proj_out.bias", "output_blocks.11.1.in_layers.0.weight", "output_blocks.11.1.in_layers.0.bias", "output_blocks.11.1.in_layers.2.weight", "output_blocks.11.1.in_layers.2.bias", "output_blocks.11.1.emb_layers.1.weight", "output_blocks.11.1.emb_layers.1.bias", "output_blocks.11.1.out_layers.0.weight", "output_blocks.11.1.out_layers.0.bias", "output_blocks.11.1.out_layers.3.weight", "output_blocks.11.1.out_layers.3.bias". size mismatch for time_embed.0.weight: copying a param with shape torch.Size([1024, 256]) from checkpoint, the shape in current model is torch.Size([512, 128]). size mismatch for time_embed.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for time_embed.2.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for time_embed.2.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for input_blocks.0.0.weight: copying a param with shape torch.Size([256, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 3, 3, 3]). size mismatch for input_blocks.0.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for input_blocks.1.0.in_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for input_blocks.1.0.in_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for input_blocks.1.0.in_layers.2.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 128, 3, 3]). size mismatch for input_blocks.1.0.in_layers.2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for input_blocks.1.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 1024]) from checkpoint, the shape in current model is torch.Size([256, 512]). size mismatch for input_blocks.1.0.emb_layers.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for input_blocks.1.0.out_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for input_blocks.1.0.out_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for input_blocks.1.0.out_layers.3.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 128, 3, 3]). size mismatch for input_blocks.1.0.out_layers.3.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for input_blocks.2.0.in_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for input_blocks.2.0.in_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for input_blocks.2.0.in_layers.2.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 128, 3, 3]). size mismatch for input_blocks.2.0.in_layers.2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for input_blocks.2.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 1024]) from checkpoint, the shape in current model is torch.Size([256, 512]). size mismatch for input_blocks.2.0.emb_layers.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for input_blocks.2.0.out_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for input_blocks.2.0.out_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for input_blocks.2.0.out_layers.3.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 128, 3, 3]). size mismatch for input_blocks.2.0.out_layers.3.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for input_blocks.4.0.in_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for input_blocks.4.0.in_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for input_blocks.4.0.in_layers.2.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 128, 3, 3]). size mismatch for input_blocks.4.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 1024]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for input_blocks.5.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 1024]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for input_blocks.7.0.in_layers.2.weight: copying a param with shape torch.Size([512, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([384, 256, 3, 3]). size mismatch for input_blocks.7.0.in_layers.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for input_blocks.7.0.emb_layers.1.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 512]). size mismatch for input_blocks.7.0.emb_layers.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for input_blocks.7.0.out_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for input_blocks.7.0.out_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for input_blocks.7.0.out_layers.3.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([384, 384, 3, 3]). size mismatch for input_blocks.7.0.out_layers.3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for input_blocks.7.0.skip_connection.weight: copying a param with shape torch.Size([512, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([384, 256, 1, 1]). size mismatch for input_blocks.7.0.skip_connection.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for input_blocks.8.0.in_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for input_blocks.8.0.in_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for input_blocks.8.0.in_layers.2.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([384, 384, 3, 3]). size mismatch for input_blocks.8.0.in_layers.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for input_blocks.8.0.emb_layers.1.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 512]). size mismatch for input_blocks.8.0.emb_layers.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for input_blocks.8.0.out_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for input_blocks.8.0.out_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for input_blocks.8.0.out_layers.3.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([384, 384, 3, 3]). size mismatch for input_blocks.8.0.out_layers.3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for input_blocks.10.0.in_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for input_blocks.10.0.in_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for input_blocks.10.0.in_layers.2.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 384, 3, 3]). size mismatch for input_blocks.10.0.emb_layers.1.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 512]). size mismatch for input_blocks.11.0.emb_layers.1.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 512]). size mismatch for middle_block.0.in_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for middle_block.0.in_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for middle_block.0.in_layers.2.weight: copying a param with shape torch.Size([1024, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]). size mismatch for middle_block.0.in_layers.2.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for middle_block.0.emb_layers.1.weight: copying a param with shape torch.Size([2048, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 512]). size mismatch for middle_block.0.emb_layers.1.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for middle_block.0.out_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for middle_block.0.out_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for middle_block.0.out_layers.3.weight: copying a param with shape torch.Size([1024, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]). size mismatch for middle_block.0.out_layers.3.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for middle_block.1.norm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for middle_block.1.norm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for middle_block.1.qkv.weight: copying a param with shape torch.Size([3072, 1024, 1]) from checkpoint, the shape in current model is torch.Size([1536, 512, 1]). size mismatch for middle_block.1.qkv.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([1536]). size mismatch for middle_block.1.proj_out.weight: copying a param with shape torch.Size([1024, 1024, 1]) from checkpoint, the shape in current model is torch.Size([512, 512, 1]). size mismatch for middle_block.1.proj_out.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for middle_block.2.in_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for middle_block.2.in_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for middle_block.2.in_layers.2.weight: copying a param with shape torch.Size([1024, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]). size mismatch for middle_block.2.in_layers.2.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for middle_block.2.emb_layers.1.weight: copying a param with shape torch.Size([2048, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 512]). size mismatch for middle_block.2.emb_layers.1.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for middle_block.2.out_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for middle_block.2.out_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for middle_block.2.out_layers.3.weight: copying a param with shape torch.Size([1024, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]). size mismatch for middle_block.2.out_layers.3.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.0.0.in_layers.0.weight: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.0.0.in_layers.0.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.0.0.in_layers.2.weight: copying a param with shape torch.Size([1024, 2048, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 1024, 3, 3]). size mismatch for output_blocks.0.0.in_layers.2.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.0.0.emb_layers.1.weight: copying a param with shape torch.Size([2048, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 512]). size mismatch for output_blocks.0.0.emb_layers.1.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.0.0.out_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.0.0.out_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.0.0.out_layers.3.weight: copying a param with shape torch.Size([1024, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]). size mismatch for output_blocks.0.0.out_layers.3.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.0.0.skip_connection.weight: copying a param with shape torch.Size([1024, 2048, 1, 1]) from checkpoint, the shape in current model is torch.Size([512, 1024, 1, 1]). size mismatch for output_blocks.0.0.skip_connection.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.0.1.norm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.0.1.norm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.0.1.qkv.weight: copying a param with shape torch.Size([3072, 1024, 1]) from checkpoint, the shape in current model is torch.Size([1536, 512, 1]). size mismatch for output_blocks.0.1.qkv.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([1536]). size mismatch for output_blocks.0.1.proj_out.weight: copying a param with shape torch.Size([1024, 1024, 1]) from checkpoint, the shape in current model is torch.Size([512, 512, 1]). size mismatch for output_blocks.0.1.proj_out.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.1.0.in_layers.0.weight: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.1.0.in_layers.0.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.1.0.in_layers.2.weight: copying a param with shape torch.Size([1024, 2048, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 1024, 3, 3]). size mismatch for output_blocks.1.0.in_layers.2.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.1.0.emb_layers.1.weight: copying a param with shape torch.Size([2048, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 512]). size mismatch for output_blocks.1.0.emb_layers.1.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.1.0.out_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.1.0.out_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.1.0.out_layers.3.weight: copying a param with shape torch.Size([1024, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]). size mismatch for output_blocks.1.0.out_layers.3.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.1.0.skip_connection.weight: copying a param with shape torch.Size([1024, 2048, 1, 1]) from checkpoint, the shape in current model is torch.Size([512, 1024, 1, 1]). size mismatch for output_blocks.1.0.skip_connection.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.1.1.norm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.1.1.norm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.1.1.qkv.weight: copying a param with shape torch.Size([3072, 1024, 1]) from checkpoint, the shape in current model is torch.Size([1536, 512, 1]). size mismatch for output_blocks.1.1.qkv.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([1536]). size mismatch for output_blocks.1.1.proj_out.weight: copying a param with shape torch.Size([1024, 1024, 1]) from checkpoint, the shape in current model is torch.Size([512, 512, 1]). size mismatch for output_blocks.1.1.proj_out.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.2.0.in_layers.0.weight: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([896]). size mismatch for output_blocks.2.0.in_layers.0.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([896]). size mismatch for output_blocks.2.0.in_layers.2.weight: copying a param with shape torch.Size([1024, 2048, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 896, 3, 3]). size mismatch for output_blocks.2.0.in_layers.2.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.2.0.emb_layers.1.weight: copying a param with shape torch.Size([2048, 1024]) from checkpoint, the shape in current model is torch.Size([1024, 512]). size mismatch for output_blocks.2.0.emb_layers.1.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.2.0.out_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.2.0.out_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.2.0.out_layers.3.weight: copying a param with shape torch.Size([1024, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]). size mismatch for output_blocks.2.0.out_layers.3.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.2.0.skip_connection.weight: copying a param with shape torch.Size([1024, 2048, 1, 1]) from checkpoint, the shape in current model is torch.Size([512, 896, 1, 1]). size mismatch for output_blocks.2.0.skip_connection.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.2.1.norm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.2.1.norm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.2.1.qkv.weight: copying a param with shape torch.Size([3072, 1024, 1]) from checkpoint, the shape in current model is torch.Size([1536, 512, 1]). size mismatch for output_blocks.2.1.qkv.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([1536]). size mismatch for output_blocks.2.1.proj_out.weight: copying a param with shape torch.Size([1024, 1024, 1]) from checkpoint, the shape in current model is torch.Size([512, 512, 1]). size mismatch for output_blocks.2.1.proj_out.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.3.0.in_layers.0.weight: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([896]). size mismatch for output_blocks.3.0.in_layers.0.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([896]). size mismatch for output_blocks.3.0.in_layers.2.weight: copying a param with shape torch.Size([1024, 2048, 3, 3]) from checkpoint, the shape in current model is torch.Size([384, 896, 3, 3]). size mismatch for output_blocks.3.0.in_layers.2.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for output_blocks.3.0.emb_layers.1.weight: copying a param with shape torch.Size([2048, 1024]) from checkpoint, the shape in current model is torch.Size([768, 512]). size mismatch for output_blocks.3.0.emb_layers.1.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for output_blocks.3.0.out_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for output_blocks.3.0.out_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for output_blocks.3.0.out_layers.3.weight: copying a param with shape torch.Size([1024, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([384, 384, 3, 3]). size mismatch for output_blocks.3.0.out_layers.3.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for output_blocks.3.0.skip_connection.weight: copying a param with shape torch.Size([1024, 2048, 1, 1]) from checkpoint, the shape in current model is torch.Size([384, 896, 1, 1]). size mismatch for output_blocks.3.0.skip_connection.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for output_blocks.3.1.norm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for output_blocks.3.1.norm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for output_blocks.3.1.qkv.weight: copying a param with shape torch.Size([3072, 1024, 1]) from checkpoint, the shape in current model is torch.Size([1152, 384, 1]). size mismatch for output_blocks.3.1.qkv.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([1152]). size mismatch for output_blocks.3.1.proj_out.weight: copying a param with shape torch.Size([1024, 1024, 1]) from checkpoint, the shape in current model is torch.Size([384, 384, 1]). size mismatch for output_blocks.3.1.proj_out.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for output_blocks.4.0.in_layers.0.weight: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for output_blocks.4.0.in_layers.0.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for output_blocks.4.0.in_layers.2.weight: copying a param with shape torch.Size([1024, 2048, 3, 3]) from checkpoint, the shape in current model is torch.Size([384, 768, 3, 3]). size mismatch for output_blocks.4.0.in_layers.2.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for output_blocks.4.0.emb_layers.1.weight: copying a param with shape torch.Size([2048, 1024]) from checkpoint, the shape in current model is torch.Size([768, 512]). size mismatch for output_blocks.4.0.emb_layers.1.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for output_blocks.4.0.out_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for output_blocks.4.0.out_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for output_blocks.4.0.out_layers.3.weight: copying a param with shape torch.Size([1024, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([384, 384, 3, 3]). size mismatch for output_blocks.4.0.out_layers.3.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for output_blocks.4.0.skip_connection.weight: copying a param with shape torch.Size([1024, 2048, 1, 1]) from checkpoint, the shape in current model is torch.Size([384, 768, 1, 1]). size mismatch for output_blocks.4.0.skip_connection.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for output_blocks.4.1.norm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for output_blocks.4.1.norm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for output_blocks.4.1.qkv.weight: copying a param with shape torch.Size([3072, 1024, 1]) from checkpoint, the shape in current model is torch.Size([1152, 384, 1]). size mismatch for output_blocks.4.1.qkv.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([1152]). size mismatch for output_blocks.4.1.proj_out.weight: copying a param with shape torch.Size([1024, 1024, 1]) from checkpoint, the shape in current model is torch.Size([384, 384, 1]). size mismatch for output_blocks.4.1.proj_out.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for output_blocks.5.0.in_layers.0.weight: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for output_blocks.5.0.in_layers.0.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for output_blocks.5.0.in_layers.2.weight: copying a param with shape torch.Size([1024, 1536, 3, 3]) from checkpoint, the shape in current model is torch.Size([384, 640, 3, 3]). size mismatch for output_blocks.5.0.in_layers.2.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for output_blocks.5.0.emb_layers.1.weight: copying a param with shape torch.Size([2048, 1024]) from checkpoint, the shape in current model is torch.Size([768, 512]). size mismatch for output_blocks.5.0.emb_layers.1.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for output_blocks.5.0.out_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for output_blocks.5.0.out_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for output_blocks.5.0.out_layers.3.weight: copying a param with shape torch.Size([1024, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([384, 384, 3, 3]). size mismatch for output_blocks.5.0.out_layers.3.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for output_blocks.5.0.skip_connection.weight: copying a param with shape torch.Size([1024, 1536, 1, 1]) from checkpoint, the shape in current model is torch.Size([384, 640, 1, 1]). size mismatch for output_blocks.5.0.skip_connection.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for output_blocks.5.1.norm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for output_blocks.5.1.norm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for output_blocks.5.1.qkv.weight: copying a param with shape torch.Size([3072, 1024, 1]) from checkpoint, the shape in current model is torch.Size([1152, 384, 1]). size mismatch for output_blocks.5.1.qkv.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([1152]). size mismatch for output_blocks.5.1.proj_out.weight: copying a param with shape torch.Size([1024, 1024, 1]) from checkpoint, the shape in current model is torch.Size([384, 384, 1]). size mismatch for output_blocks.5.1.proj_out.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for output_blocks.6.0.in_layers.0.weight: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for output_blocks.6.0.in_layers.0.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for output_blocks.6.0.in_layers.2.weight: copying a param with shape torch.Size([512, 1536, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 640, 3, 3]). size mismatch for output_blocks.6.0.in_layers.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for output_blocks.6.0.emb_layers.1.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for output_blocks.6.0.emb_layers.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.6.0.out_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for output_blocks.6.0.out_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for output_blocks.6.0.out_layers.3.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]). size mismatch for output_blocks.6.0.out_layers.3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for output_blocks.6.0.skip_connection.weight: copying a param with shape torch.Size([512, 1536, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 640, 1, 1]). size mismatch for output_blocks.6.0.skip_connection.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for output_blocks.7.0.in_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.7.0.in_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.7.0.in_layers.2.weight: copying a param with shape torch.Size([512, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 512, 3, 3]). size mismatch for output_blocks.7.0.in_layers.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for output_blocks.7.0.emb_layers.1.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for output_blocks.7.0.emb_layers.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.7.0.out_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for output_blocks.7.0.out_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for output_blocks.7.0.out_layers.3.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]). size mismatch for output_blocks.7.0.out_layers.3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for output_blocks.7.0.skip_connection.weight: copying a param with shape torch.Size([512, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 512, 1, 1]). size mismatch for output_blocks.7.0.skip_connection.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for output_blocks.8.0.in_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for output_blocks.8.0.in_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for output_blocks.8.0.in_layers.2.weight: copying a param with shape torch.Size([512, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 384, 3, 3]). size mismatch for output_blocks.8.0.in_layers.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for output_blocks.8.0.emb_layers.1.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([512, 512]). size mismatch for output_blocks.8.0.emb_layers.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.8.0.out_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for output_blocks.8.0.out_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for output_blocks.8.0.out_layers.3.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]). size mismatch for output_blocks.8.0.out_layers.3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for output_blocks.8.0.skip_connection.weight: copying a param with shape torch.Size([512, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 384, 1, 1]). size mismatch for output_blocks.8.0.skip_connection.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for output_blocks.9.0.in_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for output_blocks.9.0.in_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([384]). size mismatch for output_blocks.9.0.in_layers.2.weight: copying a param with shape torch.Size([512, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 384, 3, 3]). size mismatch for output_blocks.9.0.in_layers.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for output_blocks.9.0.emb_layers.1.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([256, 512]). size mismatch for output_blocks.9.0.emb_layers.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for output_blocks.9.0.out_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for output_blocks.9.0.out_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for output_blocks.9.0.out_layers.3.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 128, 3, 3]). size mismatch for output_blocks.9.0.out_layers.3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for output_blocks.9.0.skip_connection.weight: copying a param with shape torch.Size([512, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([128, 384, 1, 1]). size mismatch for output_blocks.9.0.skip_connection.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for output_blocks.10.0.in_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for output_blocks.10.0.in_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for output_blocks.10.0.in_layers.2.weight: copying a param with shape torch.Size([512, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 256, 3, 3]). size mismatch for output_blocks.10.0.in_layers.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for output_blocks.10.0.emb_layers.1.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([256, 512]). size mismatch for output_blocks.10.0.emb_layers.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for output_blocks.10.0.out_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for output_blocks.10.0.out_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for output_blocks.10.0.out_layers.3.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 128, 3, 3]). size mismatch for output_blocks.10.0.out_layers.3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for output_blocks.10.0.skip_connection.weight: copying a param with shape torch.Size([512, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([128, 256, 1, 1]). size mismatch for output_blocks.10.0.skip_connection.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for output_blocks.11.0.in_layers.0.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for output_blocks.11.0.in_layers.0.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for output_blocks.11.0.in_layers.2.weight: copying a param with shape torch.Size([512, 768, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 256, 3, 3]). size mismatch for output_blocks.11.0.in_layers.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for output_blocks.11.0.emb_layers.1.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([256, 512]). size mismatch for output_blocks.11.0.emb_layers.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for output_blocks.11.0.out_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for output_blocks.11.0.out_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for output_blocks.11.0.out_layers.3.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 128, 3, 3]). size mismatch for output_blocks.11.0.out_layers.3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for output_blocks.11.0.skip_connection.weight: copying a param with shape torch.Size([512, 768, 1, 1]) from checkpoint, the shape in current model is torch.Size([128, 256, 1, 1]). size mismatch for output_blocks.11.0.skip_connection.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for out.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for out.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([128]). size mismatch for out.2.weight: copying a param with shape torch.Size([6, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([3, 128, 3, 3]). size mismatch for out.2.bias: copying a param with shape torch.Size([6]) from checkpoint, the shape in current model is torch.Size([3]).

VSehwag commented 3 years ago

I also ran in a similar problem when using the 64x64 classifier. You have to use the correct classifier architecture to load it. In particular, increase the depth to 4 (as mentioned in their table 12).

mehdizemni commented 3 years ago

Thank you @VSehwag I m closing this issue

shahdghorsi commented 2 years ago

I also ran in a similar problem when using the 64x64 classifier. You have to use the correct classifier architecture to load it. In particular, increase the depth to 4 (as mentioned in their table 12). Hi, @VSehwag I am getting the same error and did not exactly understand how to increase the depth into 4 I could not find it in the paper, could you please help?

forever208 commented 1 year ago

@VSehwag Hi Sehwag

when using the pre-trained ImageNet64 ADM-G model, I got some strange FID results (basically worse FID than without using classifier, but higher IS score achieved by ADM-G)

Did you experience similar results on ADM-G before? More details are here: https://github.com/openai/guided-diffusion/issues/119