Apparent Architecture Mismatch when training on custom data

Mitchnoff commented 1 year ago

I am attempting to train a guided diffusion model on my own custom training data. I have successfully trained a model using the improved-diffusion github on my custom data and was able to sample images that fit the training data. I am now attempting to train a classifer on the same data and use it for the guided diffusion process. When I run my shell script, however, I get the following error:

Error Message

``` creating model and diffusion... Traceback (most recent call last): File "scripts/classifier_sample.py", line 134, in main() File "scripts/classifier_sample.py", line 38, in main model.load_state_dict( File "/home/allenm/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1671, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for UNetModel: Missing key(s) in state_dict: "input_blocks.4.0.in_layers.0.weight", "input_blocks.4.0.in_layers.0.bias", "input_blocks.4.0.in_layers.2.weight", "input_blocks.4.0.in_layers.2.bias", "input_blocks.4.0.emb_layers.1.weight", "input_blocks.4.0.emb_layers.1.bias", "input_blocks.4.0.out_layers.0.weight", "input_blocks.4.0.out_layers.0.bias", "input_blocks.4.0.out_layers.3.weight", "input_blocks.4.0.out_layers.3.bias", "input_blocks.7.0.skip_connection.weight", "input_blocks.7.0.skip_connection.bias", "input_blocks.8.0.in_layers.0.weight", "input_blocks.8.0.in_layers.0.bias", "input_blocks.8.0.in_layers.2.weight", "input_blocks.8.0.in_layers.2.bias", "input_blocks.8.0.emb_layers.1.weight", "input_blocks.8.0.emb_layers.1.bias", "input_blocks.8.0.out_layers.0.weight", "input_blocks.8.0.out_layers.0.bias", "input_blocks.8.0.out_layers.3.weight", "input_blocks.8.0.out_layers.3.bias", "input_blocks.10.1.norm.weight", "input_blocks.10.1.norm.bias", "input_blocks.10.1.qkv.weight", "input_blocks.10.1.qkv.bias", "input_blocks.10.1.proj_out.weight", "input_blocks.10.1.proj_out.bias", "input_blocks.11.1.norm.weight", "input_blocks.11.1.norm.bias", "input_blocks.11.1.qkv.weight", "input_blocks.11.1.qkv.bias", "input_blocks.11.1.proj_out.weight", "input_blocks.11.1.proj_out.bias", "input_blocks.12.0.in_layers.0.weight", "input_blocks.12.0.in_layers.0.bias", "input_blocks.12.0.in_layers.2.weight", "input_blocks.12.0.in_layers.2.bias", "input_blocks.12.0.emb_layers.1.weight", "input_blocks.12.0.emb_layers.1.bias", "input_blocks.12.0.out_layers.0.weight", "input_blocks.12.0.out_layers.0.bias", "input_blocks.12.0.out_layers.3.weight", "input_blocks.12.0.out_layers.3.bias", "input_blocks.13.0.skip_connection.weight", "input_blocks.13.0.skip_connection.bias", "input_blocks.13.1.norm.weight", "input_blocks.13.1.norm.bias", "input_blocks.13.1.qkv.weight", "input_blocks.13.1.qkv.bias", "input_blocks.13.1.proj_out.weight", "input_blocks.13.1.proj_out.bias", "input_blocks.14.1.norm.weight", "input_blocks.14.1.norm.bias", "input_blocks.14.1.qkv.weight", "input_blocks.14.1.qkv.bias", "input_blocks.14.1.proj_out.weight", "input_blocks.14.1.proj_out.bias", "input_blocks.16.0.in_layers.0.weight", "input_blocks.16.0.in_layers.0.bias", "input_blocks.16.0.in_layers.2.weight", "input_blocks.16.0.in_layers.2.bias", "input_blocks.16.0.emb_layers.1.weight", "input_blocks.16.0.emb_layers.1.bias", "input_blocks.16.0.out_layers.0.weight", "input_blocks.16.0.out_layers.0.bias", "input_blocks.16.0.out_layers.3.weight", "input_blocks.16.0.out_layers.3.bias", "input_blocks.16.1.norm.weight", "input_blocks.16.1.norm.bias", "input_blocks.16.1.qkv.weight", "input_blocks.16.1.qkv.bias", "input_blocks.16.1.proj_out.weight", "input_blocks.16.1.proj_out.bias", ... "output_blocks.2.2.in_layers.0.weight", "output_blocks.2.2.in_layers.0.bias", "output_blocks.2.2.in_layers.2.weight", "output_blocks.2.2.in_layers.2.bias", "output_blocks.2.2.emb_layers.1.weight", "output_blocks.2.2.emb_layers.1.bias", "output_blocks.2.2.out_layers.0.weight", "output_blocks.2.2.out_layers.0.bias", "output_blocks.2.2.out_layers.3.weight", "output_blocks.2.2.out_layers.3.bias", "output_blocks.5.2.in_layers.0.weight", "output_blocks.5.2.in_layers.0.bias", "output_blocks.5.2.in_layers.2.weight", "output_blocks.5.2.in_layers.2.bias", "output_blocks.5.2.emb_layers.1.weight", "output_blocks.5.2.emb_layers.1.bias", "output_blocks.5.2.out_layers.0.weight", "output_blocks.5.2.out_layers.0.bias", "output_blocks.5.2.out_layers.3.weight", "output_blocks.5.2.out_layers.3.bias", "output_blocks.8.1.norm.weight", "output_blocks.8.1.norm.bias", "output_blocks.8.1.qkv.weight", "output_blocks.8.1.qkv.bias", "output_blocks.8.1.proj_out.weight", "output_blocks.8.1.proj_out.bias", "output_blocks.8.2.in_layers.0.weight", "output_blocks.8.2.in_layers.0.bias", "output_blocks.8.2.in_layers.2.weight", "output_blocks.8.2.in_layers.2.bias", "output_blocks.8.2.emb_layers.1.weight", "output_blocks.8.2.emb_layers.1.bias", "output_blocks.8.2.out_layers.0.weight", "output_blocks.8.2.out_layers.0.bias", ... "output_blocks.23.0.out_layers.0.bias", "output_blocks.23.0.out_layers.3.weight", "output_blocks.23.0.out_layers.3.bias", "output_blocks.23.0.skip_connection.weight", "output_blocks.23.0.skip_connection.bias", "output_blocks.3.2.conv.weight", "output_blocks.3.2.conv.bias", "output_blocks.7.2.conv.weight", "output_blocks.7.2.conv.bias", "output_blocks.11.1.conv.weight", "output_blocks.11.1.conv.bias", "output_blocks.15.1.conv.weight", "output_blocks.15.1.conv.bias". size mismatch for time_embed.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([1024, 256]). size mismatch for time_embed.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for time_embed.2.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for time_embed.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for label_emb.weight: copying a param with shape torch.Size([1000, 512]) from checkpoint, the shape in current model is torch.Size([1000, 1024]). size mismatch for input_blocks.0.0.weight: copying a param with shape torch.Size([128, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 3, 3, 3]). size mismatch for input_blocks.0.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for input_blocks.1.0.in_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for input_blocks.1.0.in_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for input_blocks.1.0.in_layers.2.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]). size mismatch for input_blocks.1.0.in_layers.2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for input_blocks.1.0.emb_layers.1.weight: copying a param with shape torch.Size([256, 512]) from checkpoint, the shape in current model is torch.Size([512, 1024]). size mismatch for input_blocks.1.0.emb_layers.1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for input_blocks.1.0.out_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for input_blocks.1.0.out_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for input_blocks.1.0.out_layers.3.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]). size mismatch for input_blocks.1.0.out_layers.3.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for input_blocks.2.0.in_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for input_blocks.2.0.in_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]). ... size mismatch for input_blocks.5.0.out_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for input_blocks.5.0.out_layers.3.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]). size mismatch for input_blocks.5.0.out_layers.3.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for input_blocks.6.0.in_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for input_blocks.6.0.in_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for input_blocks.6.0.in_layers.2.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]). size mismatch for input_blocks.6.0.in_layers.2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for input_blocks.6.0.emb_layers.1.weight: copying a param with shape torch.Size([256, 512]) from checkpoint, the shape in current model is torch.Size([512, 1024]). size mismatch for input_blocks.6.0.emb_layers.1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for input_blocks.6.0.out_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for input_blocks.6.0.out_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for input_blocks.6.0.out_layers.3.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]). size mismatch for input_blocks.6.0.out_layers.3.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for input_blocks.7.0.in_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for input_blocks.7.0.in_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for input_blocks.7.0.in_layers.2.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 256, 3, 3]). size mismatch for input_blocks.7.0.in_layers.2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for input_blocks.7.0.emb_layers.1.weight: copying a param with shape torch.Size([256, 512]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for input_blocks.7.0.emb_layers.1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for input_blocks.7.0.out_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for input_blocks.7.0.out_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for input_blocks.7.0.out_layers.3.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]). size mismatch for input_blocks.7.0.out_layers.3.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for input_blocks.9.0.in_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for input_blocks.9.0.in_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for input_blocks.9.0.in_layers.2.weight: copying a param with shape torch.Size([256, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]). size mismatch for input_blocks.9.0.in_layers.2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for input_blocks.9.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for input_blocks.9.0.emb_layers.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for input_blocks.9.0.out_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for input_blocks.9.0.out_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for input_blocks.9.0.out_layers.3.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]). size mismatch for input_blocks.9.0.out_layers.3.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for input_blocks.10.0.in_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for input_blocks.10.0.in_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for input_blocks.10.0.in_layers.2.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]). size mismatch for input_blocks.10.0.in_layers.2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for input_blocks.10.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for input_blocks.10.0.emb_layers.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for input_blocks.10.0.out_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for input_blocks.10.0.out_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for input_blocks.10.0.out_layers.3.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]). size mismatch for input_blocks.10.0.out_layers.3.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for input_blocks.11.0.in_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for input_blocks.11.0.in_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for input_blocks.11.0.in_layers.2.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]). size mismatch for input_blocks.11.0.in_layers.2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for input_blocks.11.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for input_blocks.11.0.emb_layers.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for input_blocks.11.0.out_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for input_blocks.11.0.out_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for input_blocks.11.0.out_layers.3.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]). size mismatch for input_blocks.11.0.out_layers.3.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for input_blocks.13.0.in_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for input_blocks.13.0.in_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for input_blocks.13.0.in_layers.2.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 512, 3, 3]). size mismatch for input_blocks.13.0.in_layers.2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for input_blocks.13.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([2048, 1024]). size mismatch for input_blocks.13.0.emb_layers.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([2048]). size mismatch for input_blocks.13.0.out_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for input_blocks.13.0.out_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for input_blocks.13.0.out_layers.3.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]). size mismatch for input_blocks.13.0.out_layers.3.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for input_blocks.14.0.in_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for input_blocks.14.0.in_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for input_blocks.14.0.in_layers.2.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]). size mismatch for input_blocks.14.0.in_layers.2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for input_blocks.14.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([2048, 1024]). size mismatch for input_blocks.14.0.emb_layers.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([2048]). size mismatch for input_blocks.14.0.out_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for input_blocks.14.0.out_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for input_blocks.14.0.out_layers.3.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]). size mismatch for input_blocks.14.0.out_layers.3.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for input_blocks.15.0.in_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for input_blocks.15.0.in_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for input_blocks.15.0.in_layers.2.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]). size mismatch for input_blocks.15.0.in_layers.2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for input_blocks.15.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([2048, 1024]). size mismatch for input_blocks.15.0.emb_layers.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([2048]). size mismatch for input_blocks.15.0.out_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for input_blocks.15.0.out_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for input_blocks.15.0.out_layers.3.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]). size mismatch for input_blocks.15.0.out_layers.3.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for input_blocks.17.0.in_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for input_blocks.17.0.in_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for input_blocks.17.0.in_layers.2.weight: copying a param with shape torch.Size([512, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]). size mismatch for input_blocks.17.0.in_layers.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for input_blocks.17.0.emb_layers.1.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([2048, 1024]). size mismatch for input_blocks.17.0.emb_layers.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]). size mismatch for input_blocks.17.0.out_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for input_blocks.17.0.out_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for input_blocks.17.0.out_layers.3.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]). size mismatch for input_blocks.17.0.out_layers.3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for input_blocks.17.1.norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for input_blocks.17.1.norm.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for input_blocks.17.1.qkv.weight: copying a param with shape torch.Size([1536, 512, 1]) from checkpoint, the shape in current model is torch.Size([3072, 1024, 1]). size mismatch for input_blocks.17.1.qkv.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([3072]). size mismatch for input_blocks.17.1.proj_out.weight: copying a param with shape torch.Size([512, 512, 1]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 1]). size mismatch for input_blocks.17.1.proj_out.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for middle_block.0.in_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for middle_block.0.in_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for middle_block.0.in_layers.2.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]). size mismatch for middle_block.0.in_layers.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for middle_block.0.emb_layers.1.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([2048, 1024]). size mismatch for middle_block.0.emb_layers.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]). size mismatch for middle_block.0.out_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for middle_block.0.out_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for middle_block.0.out_layers.3.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]). size mismatch for middle_block.0.out_layers.3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for middle_block.1.norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for middle_block.1.norm.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for middle_block.1.qkv.weight: copying a param with shape torch.Size([1536, 512, 1]) from checkpoint, the shape in current model is torch.Size([3072, 1024, 1]). size mismatch for middle_block.1.qkv.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([3072]). size mismatch for middle_block.1.proj_out.weight: copying a param with shape torch.Size([512, 512, 1]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 1]). size mismatch for middle_block.1.proj_out.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for middle_block.2.in_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for middle_block.2.in_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for middle_block.2.in_layers.2.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]). size mismatch for middle_block.2.in_layers.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for middle_block.2.emb_layers.1.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([2048, 1024]). size mismatch for middle_block.2.emb_layers.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]). size mismatch for middle_block.2.out_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for middle_block.2.out_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for middle_block.2.out_layers.3.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]). size mismatch for middle_block.2.out_layers.3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.0.0.in_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]). size mismatch for output_blocks.0.0.in_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]). size mismatch for output_blocks.0.0.in_layers.2.weight: copying a param with shape torch.Size([512, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 2048, 3, 3]). size mismatch for output_blocks.0.0.in_layers.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.0.0.emb_layers.1.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([2048, 1024]). size mismatch for output_blocks.0.0.emb_layers.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]). size mismatch for output_blocks.0.0.out_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.0.0.out_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.0.0.out_layers.3.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]). size mismatch for output_blocks.0.0.out_layers.3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.0.0.skip_connection.weight: copying a param with shape torch.Size([512, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([1024, 2048, 1, 1]). size mismatch for output_blocks.0.0.skip_connection.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.0.1.norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.0.1.norm.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.0.1.qkv.weight: copying a param with shape torch.Size([1536, 512, 1]) from checkpoint, the shape in current model is torch.Size([3072, 1024, 1]). size mismatch for output_blocks.0.1.qkv.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([3072]). size mismatch for output_blocks.0.1.proj_out.weight: copying a param with shape torch.Size([512, 512, 1]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 1]). size mismatch for output_blocks.0.1.proj_out.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.1.0.in_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]). size mismatch for output_blocks.1.0.in_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]). size mismatch for output_blocks.1.0.in_layers.2.weight: copying a param with shape torch.Size([512, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 2048, 3, 3]). size mismatch for output_blocks.1.0.in_layers.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.1.0.emb_layers.1.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([2048, 1024]). size mismatch for output_blocks.1.0.emb_layers.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]). size mismatch for output_blocks.1.0.out_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.1.0.out_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.1.0.out_layers.3.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]). size mismatch for output_blocks.1.0.out_layers.3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.1.0.skip_connection.weight: copying a param with shape torch.Size([512, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([1024, 2048, 1, 1]). size mismatch for output_blocks.1.0.skip_connection.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.1.1.norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.1.1.norm.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.1.1.qkv.weight: copying a param with shape torch.Size([1536, 512, 1]) from checkpoint, the shape in current model is torch.Size([3072, 1024, 1]). size mismatch for output_blocks.1.1.qkv.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([3072]). size mismatch for output_blocks.1.1.proj_out.weight: copying a param with shape torch.Size([512, 512, 1]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 1]). size mismatch for output_blocks.1.1.proj_out.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.2.0.in_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]). size mismatch for output_blocks.2.0.in_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]). size mismatch for output_blocks.2.0.in_layers.2.weight: copying a param with shape torch.Size([512, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 2048, 3, 3]). size mismatch for output_blocks.2.0.in_layers.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.2.0.emb_layers.1.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([2048, 1024]). size mismatch for output_blocks.2.0.emb_layers.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]). size mismatch for output_blocks.2.0.out_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.2.0.out_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.2.0.out_layers.3.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]). size mismatch for output_blocks.2.0.out_layers.3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.2.0.skip_connection.weight: copying a param with shape torch.Size([512, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([1024, 2048, 1, 1]). size mismatch for output_blocks.2.0.skip_connection.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.2.1.norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.2.1.norm.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.2.1.qkv.weight: copying a param with shape torch.Size([1536, 512, 1]) from checkpoint, the shape in current model is torch.Size([3072, 1024, 1]). size mismatch for output_blocks.2.1.qkv.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([3072]). size mismatch for output_blocks.2.1.proj_out.weight: copying a param with shape torch.Size([512, 512, 1]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 1]). size mismatch for output_blocks.2.1.proj_out.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.3.0.in_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]). size mismatch for output_blocks.3.0.in_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]). size mismatch for output_blocks.3.0.in_layers.2.weight: copying a param with shape torch.Size([512, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 2048, 3, 3]). size mismatch for output_blocks.3.0.in_layers.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in ... checkpoint, the shape in current model is torch.Size([256, 512, 1, 1]). size mismatch for output_blocks.17.0.skip_connection.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for out.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for out.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for out.2.weight: copying a param with shape torch.Size([3, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([6, 256, 3, 3]). size mismatch for out.2.bias: copying a param with shape torch.Size([3]) from checkpoint, the shape in current model is torch.Size([6]). ```

The scripts used for sampling, training the diffusion model, and training the classifier

This is the script that fails and gets the message above.

Image_sample.sh

``` #!/bin/bash MODEL_FLAGS="--attention_resolutions 32,16,8 --class_cond True --image_size 256 --learn_sigma True --num_channels 256 --num_heads 4 --num_res_blocks 2 --resblock_updown True --use_fp16 True --use_scale_shift_norm True" CLASSIFIER_FLAGS="--image_size 256 --classifier_attention_resolutions 32,16,8 --classifier_depth 4 --classifier_width 32 --classifier_pool attention --classifier_resblock_updown True --classifier_use_scale_shift_norm True --classifier_scale 1.0 --classifier_use_fp16 True" # CLASSIFIER_FLAGS="--image_size 256 --classifier_attention_resolutions 32,16,8 --classifier_depth 2 --classifier_width 128 --classifier_pool attention --classifier_resblock_updown True --classifier_use_scale_shift_norm True --classifier_scale 1.0 --classifier_use_fp16 True" SAMPLE_FLAGS="--batch_size 4 --num_samples 50000 --timestep_respacing ddim25 --use_ddim True" export OPENAI_LOGDIR="~/diffusion/guided-diffusion/outputs/first_output" # fix the model path and classifier path python scripts/classifier_sample.py \ --model_path ~/path/to/model010000.pt \ --classifier_path ~/path/to/model020000.pt \ $MODEL_FLAGS $CLASSIFIER_FLAGS $SAMPLE_FLAGS

This is the script used to generate the diffusion model:

train.sh

#!/bin/bash MODEL_FLAGS="--image_size 256 --num_channels 128 --num_res_blocks 3 --class_cond True" DIFFUSION_FLAGS="--diffusion_steps 1000 --noise_schedule linear" TRAIN_FLAGS="--lr 1e-4 --batch_size 4" export OPENAI_LOGDIR="~/diffusion/improved-diffusion/training_logs/256_classcond/" python scripts/image_train.py \ --data_dir /data/path/to/improved_diffusion_data \ $MODEL_FLAGS $DIFFUSION_FLAGS $TRAIN_FLAGS ```

This is the script used to generate the classifier:

classifier_train.sh

``` #!/bin/bash TRAIN_FLAGS="--iterations 300000 --anneal_lr True --batch_size 4 --lr 3e-4 --save_interval 10000 --weight_decay 0.05" CLASSIFIER_FLAGS="--image_size 256 --classifier_attention_resolutions 32,16,8 --classifier_depth 2 --classifier_width 32 --classifier_pool attention --classifier_resblock_updown True --classifier_use_scale_shift_norm True" export OPENAI_LOGDIR="~/diffusion/guided-diffusion/trained_classifiers/256_class_test" python scripts/classifier_train.py \ --data_dir /data/ur/berisha/mitch/improved_diffusion_data \ $TRAIN_FLAGS $CLASSIFIER_FLAGS ```

Based on the error message I believe it has to do with mismatched architecture. I am attempting to retrain with different hyperparemetrs but am uncertain which ones could be causing this problem. Admittedly I am not sure that this is even the cause, so it very well may be something else causing this problem.

Any pointers in the right direction would be greatly appreciated. I am reading over the paper again and watching some videos to see if that sheds some light as to how this problem is occuring. IF any more information is needed to help let me know!

Mitchnoff commented 1 year ago

Due to the character limit I had to remove a lot of the error message. It is posted in its entirety here in case it is needed:

Full Error Message

``` creating model and diffusion... Traceback (most recent call last): File "scripts/classifier_sample.py", line 134, in main() File "scripts/classifier_sample.py", line 38, in main model.load_state_dict( File "/home/allenm/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1671, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for UNetModel: Missing key(s) in state_dict: "input_blocks.4.0.in_layers.0.weight", "input_blocks.4.0.in_layers.0.bias", "input_blocks.4.0.in_layers.2.weight", "input_blocks.4.0.in_layers.2.bias", "input_blocks.4.0.emb_layers.1.weight", "input_blocks.4.0.emb_layers.1.bias", "input_blocks.4.0.out_layers.0.weight", "input_blocks.4.0.out_layers.0.bias", "input_blocks.4.0.out_layers.3.weight", "input_blocks.4.0.out_layers.3.bias", "input_blocks.7.0.skip_connection.weight", "input_blocks.7.0.skip_connection.bias", "input_blocks.8.0.in_layers.0.weight", "input_blocks.8.0.in_layers.0.bias", "input_blocks.8.0.in_layers.2.weight", "input_blocks.8.0.in_layers.2.bias", "input_blocks.8.0.emb_layers.1.weight", "input_blocks.8.0.emb_layers.1.bias", "input_blocks.8.0.out_layers.0.weight", "input_blocks.8.0.out_layers.0.bias", "input_blocks.8.0.out_layers.3.weight", "input_blocks.8.0.out_layers.3.bias", "input_blocks.10.1.norm.weight", "input_blocks.10.1.norm.bias", "input_blocks.10.1.qkv.weight", "input_blocks.10.1.qkv.bias", "input_blocks.10.1.proj_out.weight", "input_blocks.10.1.proj_out.bias", "input_blocks.11.1.norm.weight", "input_blocks.11.1.norm.bias", "input_blocks.11.1.qkv.weight", "input_blocks.11.1.qkv.bias", "input_blocks.11.1.proj_out.weight", "input_blocks.11.1.proj_out.bias", "input_blocks.12.0.in_layers.0.weight", "input_blocks.12.0.in_layers.0.bias", "input_blocks.12.0.in_layers.2.weight", "input_blocks.12.0.in_layers.2.bias", "input_blocks.12.0.emb_layers.1.weight", "input_blocks.12.0.emb_layers.1.bias", "input_blocks.12.0.out_layers.0.weight", "input_blocks.12.0.out_layers.0.bias", "input_blocks.12.0.out_layers.3.weight", "input_blocks.12.0.out_layers.3.bias", "input_blocks.13.0.skip_connection.weight", "input_blocks.13.0.skip_connection.bias", "input_blocks.13.1.norm.weight", "input_blocks.13.1.norm.bias", "input_blocks.13.1.qkv.weight", "input_blocks.13.1.qkv.bias", "input_blocks.13.1.proj_out.weight", "input_blocks.13.1.proj_out.bias", "input_blocks.14.1.norm.weight", "input_blocks.14.1.norm.bias", "input_blocks.14.1.qkv.weight", "input_blocks.14.1.qkv.bias", "input_blocks.14.1.proj_out.weight", "input_blocks.14.1.proj_out.bias", "input_blocks.16.0.in_layers.0.weight", "input_blocks.16.0.in_layers.0.bias", "input_blocks.16.0.in_layers.2.weight", "input_blocks.16.0.in_layers.2.bias", "input_blocks.16.0.emb_layers.1.weight", "input_blocks.16.0.emb_layers.1.bias", "input_blocks.16.0.out_layers.0.weight", "input_blocks.16.0.out_layers.0.bias", "input_blocks.16.0.out_layers.3.weight", "input_blocks.16.0.out_layers.3.bias", "input_blocks.16.1.norm.weight", "input_blocks.16.1.norm.bias", "input_blocks.16.1.qkv.weight", "input_blocks.16.1.qkv.bias", "input_blocks.16.1.proj_out.weight", "input_blocks.16.1.proj_out.bias", "output_blocks.2.2.in_layers.0.weight", "output_blocks.2.2.in_layers.0.bias", "output_blocks.2.2.in_layers.2.weight", "output_blocks.2.2.in_layers.2.bias", "output_blocks.2.2.emb_layers.1.weight", "output_blocks.2.2.emb_layers.1.bias", "output_blocks.2.2.out_layers.0.weight", "output_blocks.2.2.out_layers.0.bias", "output_blocks.2.2.out_layers.3.weight", "output_blocks.2.2.out_layers.3.bias", "output_blocks.5.2.in_layers.0.weight", "output_blocks.5.2.in_layers.0.bias", "output_blocks.5.2.in_layers.2.weight", "output_blocks.5.2.in_layers.2.bias", "output_blocks.5.2.emb_layers.1.weight", "output_blocks.5.2.emb_layers.1.bias", "output_blocks.5.2.out_layers.0.weight", "output_blocks.5.2.out_layers.0.bias", "output_blocks.5.2.out_layers.3.weight", "output_blocks.5.2.out_layers.3.bias", "output_blocks.8.1.norm.weight", "output_blocks.8.1.norm.bias", "output_blocks.8.1.qkv.weight", "output_blocks.8.1.qkv.bias", "output_blocks.8.1.proj_out.weight", "output_blocks.8.1.proj_out.bias", "output_blocks.8.2.in_layers.0.weight", "output_blocks.8.2.in_layers.0.bias", "output_blocks.8.2.in_layers.2.weight", "output_blocks.8.2.in_layers.2.bias", "output_blocks.8.2.emb_layers.1.weight", "output_blocks.8.2.emb_layers.1.bias", "output_blocks.8.2.out_layers.0.weight", "output_blocks.8.2.out_layers.0.bias", "output_blocks.8.2.out_layers.3.weight", "output_blocks.8.2.out_layers.3.bias", "output_blocks.11.1.in_layers.0.weight", "output_blocks.11.1.in_layers.0.bias", "output_blocks.11.1.in_layers.2.weight", "output_blocks.11.1.in_layers.2.bias", "output_blocks.11.1.emb_layers.1.weight", "output_blocks.11.1.emb_layers.1.bias", "output_blocks.11.1.out_layers.0.weight", "output_blocks.11.1.out_layers.0.bias", "output_blocks.11.1.out_layers.3.weight", "output_blocks.11.1.out_layers.3.bias", "output_blocks.14.1.in_layers.0.weight", "output_blocks.14.1.in_layers.0.bias", "output_blocks.14.1.in_layers.2.weight", "output_blocks.14.1.in_layers.2.bias", "output_blocks.14.1.emb_layers.1.weight", "output_blocks.14.1.emb_layers.1.bias", "output_blocks.14.1.out_layers.0.weight", "output_blocks.14.1.out_layers.0.bias", "output_blocks.14.1.out_layers.3.weight", "output_blocks.14.1.out_layers.3.bias". Unexpected key(s) in state_dict: "input_blocks.18.0.in_layers.0.weight", "input_blocks.18.0.in_layers.0.bias", "input_blocks.18.0.in_layers.2.weight", "input_blocks.18.0.in_layers.2.bias", "input_blocks.18.0.emb_layers.1.weight", "input_blocks.18.0.emb_layers.1.bias", "input_blocks.18.0.out_layers.0.weight", "input_blocks.18.0.out_layers.0.bias", "input_blocks.18.0.out_layers.3.weight", "input_blocks.18.0.out_layers.3.bias", "input_blocks.18.1.norm.weight", "input_blocks.18.1.norm.bias", "input_blocks.18.1.qkv.weight", "input_blocks.18.1.qkv.bias", "input_blocks.18.1.proj_out.weight", "input_blocks.18.1.proj_out.bias", "input_blocks.19.0.in_layers.0.weight", "input_blocks.19.0.in_layers.0.bias", "input_blocks.19.0.in_layers.2.weight", "input_blocks.19.0.in_layers.2.bias", "input_blocks.19.0.emb_layers.1.weight", "input_blocks.19.0.emb_layers.1.bias", "input_blocks.19.0.out_layers.0.weight", "input_blocks.19.0.out_layers.0.bias", "input_blocks.19.0.out_layers.3.weight", "input_blocks.19.0.out_layers.3.bias", "input_blocks.19.1.norm.weight", "input_blocks.19.1.norm.bias", "input_blocks.19.1.qkv.weight", "input_blocks.19.1.qkv.bias", "input_blocks.19.1.proj_out.weight", "input_blocks.19.1.proj_out.bias", "input_blocks.20.0.op.weight", "input_blocks.20.0.op.bias", "input_blocks.21.0.in_layers.0.weight", "input_blocks.21.0.in_layers.0.bias", "input_blocks.21.0.in_layers.2.weight", "input_blocks.21.0.in_layers.2.bias", "input_blocks.21.0.emb_layers.1.weight", "input_blocks.21.0.emb_layers.1.bias", "input_blocks.21.0.out_layers.0.weight", "input_blocks.21.0.out_layers.0.bias", "input_blocks.21.0.out_layers.3.weight", "input_blocks.21.0.out_layers.3.bias", "input_blocks.21.1.norm.weight", "input_blocks.21.1.norm.bias", "input_blocks.21.1.qkv.weight", "input_blocks.21.1.qkv.bias", "input_blocks.21.1.proj_out.weight", "input_blocks.21.1.proj_out.bias", "input_blocks.22.0.in_layers.0.weight", "input_blocks.22.0.in_layers.0.bias", "input_blocks.22.0.in_layers.2.weight", "input_blocks.22.0.in_layers.2.bias", "input_blocks.22.0.emb_layers.1.weight", "input_blocks.22.0.emb_layers.1.bias", "input_blocks.22.0.out_layers.0.weight", "input_blocks.22.0.out_layers.0.bias", "input_blocks.22.0.out_layers.3.weight", "input_blocks.22.0.out_layers.3.bias", "input_blocks.22.1.norm.weight", "input_blocks.22.1.norm.bias", "input_blocks.22.1.qkv.weight", "input_blocks.22.1.qkv.bias", "input_blocks.22.1.proj_out.weight", "input_blocks.22.1.proj_out.bias", "input_blocks.23.0.in_layers.0.weight", "input_blocks.23.0.in_layers.0.bias", "input_blocks.23.0.in_layers.2.weight", "input_blocks.23.0.in_layers.2.bias", "input_blocks.23.0.emb_layers.1.weight", "input_blocks.23.0.emb_layers.1.bias", "input_blocks.23.0.out_layers.0.weight", "input_blocks.23.0.out_layers.0.bias", "input_blocks.23.0.out_layers.3.weight", "input_blocks.23.0.out_layers.3.bias", "input_blocks.23.1.norm.weight", "input_blocks.23.1.norm.bias", "input_blocks.23.1.qkv.weight", "input_blocks.23.1.qkv.bias", "input_blocks.23.1.proj_out.weight", "input_blocks.23.1.proj_out.bias", "input_blocks.4.0.op.weight", "input_blocks.4.0.op.bias", "input_blocks.8.0.op.weight", "input_blocks.8.0.op.bias", "input_blocks.9.0.skip_connection.weight", "input_blocks.9.0.skip_connection.bias", "input_blocks.12.0.op.weight", "input_blocks.12.0.op.bias", "input_blocks.16.0.op.weight", "input_blocks.16.0.op.bias", "input_blocks.17.0.skip_connection.weight", "input_blocks.17.0.skip_connection.bias", "output_blocks.18.0.in_layers.0.weight", "output_blocks.18.0.in_layers.0.bias", "output_blocks.18.0.in_layers.2.weight", "output_blocks.18.0.in_layers.2.bias", "output_blocks.18.0.emb_layers.1.weight", "output_blocks.18.0.emb_layers.1.bias", "output_blocks.18.0.out_layers.0.weight", "output_blocks.18.0.out_layers.0.bias", "output_blocks.18.0.out_layers.3.weight", "output_blocks.18.0.out_layers.3.bias", "output_blocks.18.0.skip_connection.weight", "output_blocks.18.0.skip_connection.bias", "output_blocks.19.0.in_layers.0.weight", "output_blocks.19.0.in_layers.0.bias", "output_blocks.19.0.in_layers.2.weight", "output_blocks.19.0.in_layers.2.bias", "output_blocks.19.0.emb_layers.1.weight", "output_blocks.19.0.emb_layers.1.bias", "output_blocks.19.0.out_layers.0.weight", "output_blocks.19.0.out_layers.0.bias", "output_blocks.19.0.out_layers.3.weight", "output_blocks.19.0.out_layers.3.bias", "output_blocks.19.0.skip_connection.weight", "output_blocks.19.0.skip_connection.bias", "output_blocks.19.1.conv.weight", "output_blocks.19.1.conv.bias", "output_blocks.20.0.in_layers.0.weight", "output_blocks.20.0.in_layers.0.bias", "output_blocks.20.0.in_layers.2.weight", "output_blocks.20.0.in_layers.2.bias", "output_blocks.20.0.emb_layers.1.weight", "output_blocks.20.0.emb_layers.1.bias", "output_blocks.20.0.out_layers.0.weight", "output_blocks.20.0.out_layers.0.bias", "output_blocks.20.0.out_layers.3.weight", "output_blocks.20.0.out_layers.3.bias", "output_blocks.20.0.skip_connection.weight", "output_blocks.20.0.skip_connection.bias", "output_blocks.21.0.in_layers.0.weight", "output_blocks.21.0.in_layers.0.bias", "output_blocks.21.0.in_layers.2.weight", "output_blocks.21.0.in_layers.2.bias", "output_blocks.21.0.emb_layers.1.weight", "output_blocks.21.0.emb_layers.1.bias", "output_blocks.21.0.out_layers.0.weight", "output_blocks.21.0.out_layers.0.bias", "output_blocks.21.0.out_layers.3.weight", "output_blocks.21.0.out_layers.3.bias", "output_blocks.21.0.skip_connection.weight", "output_blocks.21.0.skip_connection.bias", "output_blocks.22.0.in_layers.0.weight", "output_blocks.22.0.in_layers.0.bias", "output_blocks.22.0.in_layers.2.weight", "output_blocks.22.0.in_layers.2.bias", "output_blocks.22.0.emb_layers.1.weight", "output_blocks.22.0.emb_layers.1.bias", "output_blocks.22.0.out_layers.0.weight", "output_blocks.22.0.out_layers.0.bias", "output_blocks.22.0.out_layers.3.weight", "output_blocks.22.0.out_layers.3.bias", "output_blocks.22.0.skip_connection.weight", "output_blocks.22.0.skip_connection.bias", "output_blocks.23.0.in_layers.0.weight", "output_blocks.23.0.in_layers.0.bias", "output_blocks.23.0.in_layers.2.weight", "output_blocks.23.0.in_layers.2.bias", "output_blocks.23.0.emb_layers.1.weight", "output_blocks.23.0.emb_layers.1.bias", "output_blocks.23.0.out_layers.0.weight", "output_blocks.23.0.out_layers.0.bias", "output_blocks.23.0.out_layers.3.weight", "output_blocks.23.0.out_layers.3.bias", "output_blocks.23.0.skip_connection.weight", "output_blocks.23.0.skip_connection.bias", "output_blocks.3.2.conv.weight", "output_blocks.3.2.conv.bias", "output_blocks.7.2.conv.weight", "output_blocks.7.2.conv.bias", "output_blocks.11.1.conv.weight", "output_blocks.11.1.conv.bias", "output_blocks.15.1.conv.weight", "output_blocks.15.1.conv.bias". size mismatch for time_embed.0.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([1024, 256]). size mismatch for time_embed.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for time_embed.2.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for time_embed.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for label_emb.weight: copying a param with shape torch.Size([1000, 512]) from checkpoint, the shape in current model is torch.Size([1000, 1024]). size mismatch for input_blocks.0.0.weight: copying a param with shape torch.Size([128, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 3, 3, 3]). size mismatch for input_blocks.0.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for input_blocks.1.0.in_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for input_blocks.1.0.in_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for input_blocks.1.0.in_layers.2.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]). size mismatch for input_blocks.1.0.in_layers.2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for input_blocks.1.0.emb_layers.1.weight: copying a param with shape torch.Size([256, 512]) from checkpoint, the shape in current model is torch.Size([512, 1024]). size mismatch for input_blocks.1.0.emb_layers.1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for input_blocks.1.0.out_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for input_blocks.1.0.out_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for input_blocks.1.0.out_layers.3.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]). size mismatch for input_blocks.1.0.out_layers.3.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for input_blocks.2.0.in_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for input_blocks.2.0.in_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for input_blocks.2.0.in_layers.2.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]). size mismatch for input_blocks.2.0.in_layers.2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for input_blocks.2.0.emb_layers.1.weight: copying a param with shape torch.Size([256, 512]) from checkpoint, the shape in current model is torch.Size([512, 1024]). size mismatch for input_blocks.2.0.emb_layers.1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for input_blocks.2.0.out_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for input_blocks.2.0.out_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for input_blocks.2.0.out_layers.3.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]). size mismatch for input_blocks.2.0.out_layers.3.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for input_blocks.3.0.in_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for input_blocks.3.0.in_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for input_blocks.3.0.in_layers.2.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]). size mismatch for input_blocks.3.0.in_layers.2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for input_blocks.3.0.emb_layers.1.weight: copying a param with shape torch.Size([256, 512]) from checkpoint, the shape in current model is torch.Size([512, 1024]). size mismatch for input_blocks.3.0.emb_layers.1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for input_blocks.3.0.out_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for input_blocks.3.0.out_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for input_blocks.3.0.out_layers.3.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]). size mismatch for input_blocks.3.0.out_layers.3.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for input_blocks.5.0.in_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for input_blocks.5.0.in_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for input_blocks.5.0.in_layers.2.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]). size mismatch for input_blocks.5.0.in_layers.2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for input_blocks.5.0.emb_layers.1.weight: copying a param with shape torch.Size([256, 512]) from checkpoint, the shape in current model is torch.Size([512, 1024]). size mismatch for input_blocks.5.0.emb_layers.1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for input_blocks.5.0.out_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for input_blocks.5.0.out_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for input_blocks.5.0.out_layers.3.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]). size mismatch for input_blocks.5.0.out_layers.3.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for input_blocks.6.0.in_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for input_blocks.6.0.in_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for input_blocks.6.0.in_layers.2.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]). size mismatch for input_blocks.6.0.in_layers.2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for input_blocks.6.0.emb_layers.1.weight: copying a param with shape torch.Size([256, 512]) from checkpoint, the shape in current model is torch.Size([512, 1024]). size mismatch for input_blocks.6.0.emb_layers.1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for input_blocks.6.0.out_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for input_blocks.6.0.out_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for input_blocks.6.0.out_layers.3.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]). size mismatch for input_blocks.6.0.out_layers.3.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for input_blocks.7.0.in_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for input_blocks.7.0.in_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for input_blocks.7.0.in_layers.2.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 256, 3, 3]). size mismatch for input_blocks.7.0.in_layers.2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for input_blocks.7.0.emb_layers.1.weight: copying a param with shape torch.Size([256, 512]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for input_blocks.7.0.emb_layers.1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for input_blocks.7.0.out_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for input_blocks.7.0.out_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for input_blocks.7.0.out_layers.3.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]). size mismatch for input_blocks.7.0.out_layers.3.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for input_blocks.9.0.in_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for input_blocks.9.0.in_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for input_blocks.9.0.in_layers.2.weight: copying a param with shape torch.Size([256, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]). size mismatch for input_blocks.9.0.in_layers.2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for input_blocks.9.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for input_blocks.9.0.emb_layers.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for input_blocks.9.0.out_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for input_blocks.9.0.out_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for input_blocks.9.0.out_layers.3.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]). size mismatch for input_blocks.9.0.out_layers.3.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for input_blocks.10.0.in_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for input_blocks.10.0.in_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for input_blocks.10.0.in_layers.2.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]). size mismatch for input_blocks.10.0.in_layers.2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for input_blocks.10.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for input_blocks.10.0.emb_layers.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for input_blocks.10.0.out_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for input_blocks.10.0.out_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for input_blocks.10.0.out_layers.3.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]). size mismatch for input_blocks.10.0.out_layers.3.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for input_blocks.11.0.in_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for input_blocks.11.0.in_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for input_blocks.11.0.in_layers.2.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]). size mismatch for input_blocks.11.0.in_layers.2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for input_blocks.11.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for input_blocks.11.0.emb_layers.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for input_blocks.11.0.out_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for input_blocks.11.0.out_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for input_blocks.11.0.out_layers.3.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]). size mismatch for input_blocks.11.0.out_layers.3.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for input_blocks.13.0.in_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for input_blocks.13.0.in_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for input_blocks.13.0.in_layers.2.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 512, 3, 3]). size mismatch for input_blocks.13.0.in_layers.2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for input_blocks.13.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([2048, 1024]). size mismatch for input_blocks.13.0.emb_layers.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([2048]). size mismatch for input_blocks.13.0.out_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for input_blocks.13.0.out_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for input_blocks.13.0.out_layers.3.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]). size mismatch for input_blocks.13.0.out_layers.3.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for input_blocks.14.0.in_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for input_blocks.14.0.in_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for input_blocks.14.0.in_layers.2.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]). size mismatch for input_blocks.14.0.in_layers.2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for input_blocks.14.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([2048, 1024]). size mismatch for input_blocks.14.0.emb_layers.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([2048]). size mismatch for input_blocks.14.0.out_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for input_blocks.14.0.out_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for input_blocks.14.0.out_layers.3.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]). size mismatch for input_blocks.14.0.out_layers.3.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for input_blocks.15.0.in_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for input_blocks.15.0.in_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for input_blocks.15.0.in_layers.2.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]). size mismatch for input_blocks.15.0.in_layers.2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for input_blocks.15.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([2048, 1024]). size mismatch for input_blocks.15.0.emb_layers.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([2048]). size mismatch for input_blocks.15.0.out_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for input_blocks.15.0.out_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for input_blocks.15.0.out_layers.3.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]). size mismatch for input_blocks.15.0.out_layers.3.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for input_blocks.17.0.in_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for input_blocks.17.0.in_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for input_blocks.17.0.in_layers.2.weight: copying a param with shape torch.Size([512, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]). size mismatch for input_blocks.17.0.in_layers.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for input_blocks.17.0.emb_layers.1.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([2048, 1024]). size mismatch for input_blocks.17.0.emb_layers.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]). size mismatch for input_blocks.17.0.out_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for input_blocks.17.0.out_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for input_blocks.17.0.out_layers.3.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]). size mismatch for input_blocks.17.0.out_layers.3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for input_blocks.17.1.norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for input_blocks.17.1.norm.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for input_blocks.17.1.qkv.weight: copying a param with shape torch.Size([1536, 512, 1]) from checkpoint, the shape in current model is torch.Size([3072, 1024, 1]). size mismatch for input_blocks.17.1.qkv.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([3072]). size mismatch for input_blocks.17.1.proj_out.weight: copying a param with shape torch.Size([512, 512, 1]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 1]). size mismatch for input_blocks.17.1.proj_out.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for middle_block.0.in_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for middle_block.0.in_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for middle_block.0.in_layers.2.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]). size mismatch for middle_block.0.in_layers.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for middle_block.0.emb_layers.1.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([2048, 1024]). size mismatch for middle_block.0.emb_layers.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]). size mismatch for middle_block.0.out_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for middle_block.0.out_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for middle_block.0.out_layers.3.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]). size mismatch for middle_block.0.out_layers.3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for middle_block.1.norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for middle_block.1.norm.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for middle_block.1.qkv.weight: copying a param with shape torch.Size([1536, 512, 1]) from checkpoint, the shape in current model is torch.Size([3072, 1024, 1]). size mismatch for middle_block.1.qkv.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([3072]). size mismatch for middle_block.1.proj_out.weight: copying a param with shape torch.Size([512, 512, 1]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 1]). size mismatch for middle_block.1.proj_out.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for middle_block.2.in_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for middle_block.2.in_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for middle_block.2.in_layers.2.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]). size mismatch for middle_block.2.in_layers.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for middle_block.2.emb_layers.1.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([2048, 1024]). size mismatch for middle_block.2.emb_layers.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]). size mismatch for middle_block.2.out_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for middle_block.2.out_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for middle_block.2.out_layers.3.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]). size mismatch for middle_block.2.out_layers.3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.0.0.in_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]). size mismatch for output_blocks.0.0.in_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]). size mismatch for output_blocks.0.0.in_layers.2.weight: copying a param with shape torch.Size([512, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 2048, 3, 3]). size mismatch for output_blocks.0.0.in_layers.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.0.0.emb_layers.1.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([2048, 1024]). size mismatch for output_blocks.0.0.emb_layers.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]). size mismatch for output_blocks.0.0.out_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.0.0.out_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.0.0.out_layers.3.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]). size mismatch for output_blocks.0.0.out_layers.3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.0.0.skip_connection.weight: copying a param with shape torch.Size([512, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([1024, 2048, 1, 1]). size mismatch for output_blocks.0.0.skip_connection.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.0.1.norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.0.1.norm.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.0.1.qkv.weight: copying a param with shape torch.Size([1536, 512, 1]) from checkpoint, the shape in current model is torch.Size([3072, 1024, 1]). size mismatch for output_blocks.0.1.qkv.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([3072]). size mismatch for output_blocks.0.1.proj_out.weight: copying a param with shape torch.Size([512, 512, 1]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 1]). size mismatch for output_blocks.0.1.proj_out.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.1.0.in_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]). size mismatch for output_blocks.1.0.in_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]). size mismatch for output_blocks.1.0.in_layers.2.weight: copying a param with shape torch.Size([512, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 2048, 3, 3]). size mismatch for output_blocks.1.0.in_layers.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.1.0.emb_layers.1.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([2048, 1024]). size mismatch for output_blocks.1.0.emb_layers.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]). size mismatch for output_blocks.1.0.out_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.1.0.out_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.1.0.out_layers.3.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]). size mismatch for output_blocks.1.0.out_layers.3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.1.0.skip_connection.weight: copying a param with shape torch.Size([512, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([1024, 2048, 1, 1]). size mismatch for output_blocks.1.0.skip_connection.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.1.1.norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.1.1.norm.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.1.1.qkv.weight: copying a param with shape torch.Size([1536, 512, 1]) from checkpoint, the shape in current model is torch.Size([3072, 1024, 1]). size mismatch for output_blocks.1.1.qkv.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([3072]). size mismatch for output_blocks.1.1.proj_out.weight: copying a param with shape torch.Size([512, 512, 1]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 1]). size mismatch for output_blocks.1.1.proj_out.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.2.0.in_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]). size mismatch for output_blocks.2.0.in_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]). size mismatch for output_blocks.2.0.in_layers.2.weight: copying a param with shape torch.Size([512, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 2048, 3, 3]). size mismatch for output_blocks.2.0.in_layers.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.2.0.emb_layers.1.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([2048, 1024]). size mismatch for output_blocks.2.0.emb_layers.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]). size mismatch for output_blocks.2.0.out_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.2.0.out_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.2.0.out_layers.3.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]). size mismatch for output_blocks.2.0.out_layers.3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.2.0.skip_connection.weight: copying a param with shape torch.Size([512, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([1024, 2048, 1, 1]). size mismatch for output_blocks.2.0.skip_connection.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.2.1.norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.2.1.norm.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.2.1.qkv.weight: copying a param with shape torch.Size([1536, 512, 1]) from checkpoint, the shape in current model is torch.Size([3072, 1024, 1]). size mismatch for output_blocks.2.1.qkv.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([3072]). size mismatch for output_blocks.2.1.proj_out.weight: copying a param with shape torch.Size([512, 512, 1]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 1]). size mismatch for output_blocks.2.1.proj_out.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.3.0.in_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]). size mismatch for output_blocks.3.0.in_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]). size mismatch for output_blocks.3.0.in_layers.2.weight: copying a param with shape torch.Size([512, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 2048, 3, 3]). size mismatch for output_blocks.3.0.in_layers.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.3.0.emb_layers.1.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([2048, 1024]). size mismatch for output_blocks.3.0.emb_layers.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]). size mismatch for output_blocks.3.0.out_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.3.0.out_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.3.0.out_layers.3.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]). size mismatch for output_blocks.3.0.out_layers.3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.3.0.skip_connection.weight: copying a param with shape torch.Size([512, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([1024, 2048, 1, 1]). size mismatch for output_blocks.3.0.skip_connection.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.3.1.norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.3.1.norm.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.3.1.qkv.weight: copying a param with shape torch.Size([1536, 512, 1]) from checkpoint, the shape in current model is torch.Size([3072, 1024, 1]). size mismatch for output_blocks.3.1.qkv.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([3072]). size mismatch for output_blocks.3.1.proj_out.weight: copying a param with shape torch.Size([512, 512, 1]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 1]). size mismatch for output_blocks.3.1.proj_out.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.4.0.in_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]). size mismatch for output_blocks.4.0.in_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]). size mismatch for output_blocks.4.0.in_layers.2.weight: copying a param with shape torch.Size([512, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 2048, 3, 3]). size mismatch for output_blocks.4.0.in_layers.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.4.0.emb_layers.1.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([2048, 1024]). size mismatch for output_blocks.4.0.emb_layers.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]). size mismatch for output_blocks.4.0.out_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.4.0.out_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.4.0.out_layers.3.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]). size mismatch for output_blocks.4.0.out_layers.3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.4.0.skip_connection.weight: copying a param with shape torch.Size([512, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([1024, 2048, 1, 1]). size mismatch for output_blocks.4.0.skip_connection.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.4.1.norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.4.1.norm.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.4.1.qkv.weight: copying a param with shape torch.Size([1536, 512, 1]) from checkpoint, the shape in current model is torch.Size([3072, 1024, 1]). size mismatch for output_blocks.4.1.qkv.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([3072]). size mismatch for output_blocks.4.1.proj_out.weight: copying a param with shape torch.Size([512, 512, 1]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 1]). size mismatch for output_blocks.4.1.proj_out.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.5.0.in_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1536]). size mismatch for output_blocks.5.0.in_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1536]). size mismatch for output_blocks.5.0.in_layers.2.weight: copying a param with shape torch.Size([512, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1536, 3, 3]). size mismatch for output_blocks.5.0.in_layers.2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.5.0.emb_layers.1.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([2048, 1024]). size mismatch for output_blocks.5.0.emb_layers.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([2048]). size mismatch for output_blocks.5.0.out_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.5.0.out_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.5.0.out_layers.3.weight: copying a param with shape torch.Size([512, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 3, 3]). size mismatch for output_blocks.5.0.out_layers.3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.5.0.skip_connection.weight: copying a param with shape torch.Size([512, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([1024, 1536, 1, 1]). size mismatch for output_blocks.5.0.skip_connection.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.5.1.norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.5.1.norm.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.5.1.qkv.weight: copying a param with shape torch.Size([1536, 512, 1]) from checkpoint, the shape in current model is torch.Size([3072, 1024, 1]). size mismatch for output_blocks.5.1.qkv.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([3072]). size mismatch for output_blocks.5.1.proj_out.weight: copying a param with shape torch.Size([512, 512, 1]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 1]). size mismatch for output_blocks.5.1.proj_out.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.6.0.in_layers.0.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1536]). size mismatch for output_blocks.6.0.in_layers.0.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1536]). size mismatch for output_blocks.6.0.in_layers.2.weight: copying a param with shape torch.Size([512, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 1536, 3, 3]). size mismatch for output_blocks.6.0.emb_layers.1.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for output_blocks.6.0.skip_connection.weight: copying a param with shape torch.Size([512, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([512, 1536, 1, 1]). size mismatch for output_blocks.7.0.in_layers.0.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.7.0.in_layers.0.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.7.0.in_layers.2.weight: copying a param with shape torch.Size([512, 768, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 1024, 3, 3]). size mismatch for output_blocks.7.0.emb_layers.1.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for output_blocks.7.0.skip_connection.weight: copying a param with shape torch.Size([512, 768, 1, 1]) from checkpoint, the shape in current model is torch.Size([512, 1024, 1, 1]). size mismatch for output_blocks.8.0.in_layers.0.weight: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.8.0.in_layers.0.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.8.0.in_layers.2.weight: copying a param with shape torch.Size([256, 768, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 1024, 3, 3]). size mismatch for output_blocks.8.0.in_layers.2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.8.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for output_blocks.8.0.emb_layers.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.8.0.out_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.8.0.out_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.8.0.out_layers.3.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]). size mismatch for output_blocks.8.0.out_layers.3.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.8.0.skip_connection.weight: copying a param with shape torch.Size([256, 768, 1, 1]) from checkpoint, the shape in current model is torch.Size([512, 1024, 1, 1]). size mismatch for output_blocks.8.0.skip_connection.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.9.0.in_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.9.0.in_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.9.0.in_layers.2.weight: copying a param with shape torch.Size([256, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 1024, 3, 3]). size mismatch for output_blocks.9.0.in_layers.2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.9.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for output_blocks.9.0.emb_layers.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.9.0.out_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.9.0.out_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.9.0.out_layers.3.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]). size mismatch for output_blocks.9.0.out_layers.3.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.9.0.skip_connection.weight: copying a param with shape torch.Size([256, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([512, 1024, 1, 1]). size mismatch for output_blocks.9.0.skip_connection.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.10.0.in_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.10.0.in_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.10.0.in_layers.2.weight: copying a param with shape torch.Size([256, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 1024, 3, 3]). size mismatch for output_blocks.10.0.in_layers.2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.10.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for output_blocks.10.0.emb_layers.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.10.0.out_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.10.0.out_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.10.0.out_layers.3.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]). size mismatch for output_blocks.10.0.out_layers.3.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.10.0.skip_connection.weight: copying a param with shape torch.Size([256, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([512, 1024, 1, 1]). size mismatch for output_blocks.10.0.skip_connection.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.11.0.in_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for output_blocks.11.0.in_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for output_blocks.11.0.in_layers.2.weight: copying a param with shape torch.Size([256, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 768, 3, 3]). size mismatch for output_blocks.11.0.in_layers.2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.11.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for output_blocks.11.0.emb_layers.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for output_blocks.11.0.out_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.11.0.out_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.11.0.out_layers.3.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]). size mismatch for output_blocks.11.0.out_layers.3.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.11.0.skip_connection.weight: copying a param with shape torch.Size([256, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([512, 768, 1, 1]). size mismatch for output_blocks.11.0.skip_connection.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.12.0.in_layers.0.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for output_blocks.12.0.in_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]). size mismatch for output_blocks.12.0.in_layers.2.weight: copying a param with shape torch.Size([256, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 768, 3, 3]). size mismatch for output_blocks.12.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([512, 1024]). size mismatch for output_blocks.12.0.skip_connection.weight: copying a param with shape torch.Size([256, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 768, 1, 1]). size mismatch for output_blocks.13.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([512, 1024]). size mismatch for output_blocks.14.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([512, 1024]). size mismatch for output_blocks.15.0.in_layers.0.weight: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.15.0.in_layers.0.bias: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.15.0.in_layers.2.weight: copying a param with shape torch.Size([256, 384, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 512, 3, 3]). size mismatch for output_blocks.15.0.emb_layers.1.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([512, 1024]). size mismatch for output_blocks.15.0.skip_connection.weight: copying a param with shape torch.Size([256, 384, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 512, 1, 1]). size mismatch for output_blocks.16.0.in_layers.0.weight: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.16.0.in_layers.0.bias: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.16.0.in_layers.2.weight: copying a param with shape torch.Size([128, 384, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 512, 3, 3]). size mismatch for output_blocks.16.0.in_layers.2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for output_blocks.16.0.emb_layers.1.weight: copying a param with shape torch.Size([256, 512]) from checkpoint, the shape in current model is torch.Size([512, 1024]). size mismatch for output_blocks.16.0.emb_layers.1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.16.0.out_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for output_blocks.16.0.out_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for output_blocks.16.0.out_layers.3.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]). size mismatch for output_blocks.16.0.out_layers.3.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for output_blocks.16.0.skip_connection.weight: copying a param with shape torch.Size([128, 384, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 512, 1, 1]). size mismatch for output_blocks.16.0.skip_connection.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for output_blocks.17.0.in_layers.0.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.17.0.in_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.17.0.in_layers.2.weight: copying a param with shape torch.Size([128, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 512, 3, 3]). size mismatch for output_blocks.17.0.in_layers.2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for output_blocks.17.0.emb_layers.1.weight: copying a param with shape torch.Size([256, 512]) from checkpoint, the shape in current model is torch.Size([512, 1024]). size mismatch for output_blocks.17.0.emb_layers.1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([512]). size mismatch for output_blocks.17.0.out_layers.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for output_blocks.17.0.out_layers.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for output_blocks.17.0.out_layers.3.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]). size mismatch for output_blocks.17.0.out_layers.3.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for output_blocks.17.0.skip_connection.weight: copying a param with shape torch.Size([128, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 512, 1, 1]). size mismatch for output_blocks.17.0.skip_connection.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for out.0.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for out.0.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([256]). size mismatch for out.2.weight: copying a param with shape torch.Size([3, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([6, 256, 3, 3]). size mismatch for out.2.bias: copying a param with shape torch.Size([3]) from checkpoint, the shape in current model is torch.Size([6]). ```

Mitchnoff commented 1 year ago

Additionally, here is the script used to train the diffusion model:

train.sh

``` #!/bin/bash MODEL_FLAGS="--image_size 256 --num_channels 128 --num_res_blocks 3 --class_cond True" DIFFUSION_FLAGS="--diffusion_steps 1000 --noise_schedule linear" TRAIN_FLAGS="--lr 1e-4 --batch_size 4" python scripts/image_train.py \ --data_dir /path/to/improved_diffusion_data \ $MODEL_FLAGS $DIFFUSION_FLAGS $TRAIN_FLAGS ```

anicej commented 1 year ago

@Mitchnoff, I get the same error. Could you let me know if you found a solution?

Mitchnoff commented 1 year ago

@Mitchnoff, I get the same error. Could you let me know if you found a solution?

I've tried working on 64x64 images to start but am still facing the same issue. I will definitely update this if a solution is found. What happened on your side to get the same results? Were you trying to train on your own dataset?

HioZx commented 1 year ago

@Mitchnoff, I get the same error. Have you found a solution?

shengshneg123 commented 1 year ago

I get the same error. Have you found a solution?

sushilkhadkaanon commented 7 months ago

@shengshneg123 @HioZx @Mitchnoff @anicej has anyone found the solution yet?

chuhuan88 commented 4 months ago

I get the same error. Have you found a solution?

openai / guided-diffusion

Apparent Architecture Mismatch when training on custom data #96

The scripts used for sampling, training the diffusion model, and training the classifier