Open acardara opened 3 years ago
I matched the model architecture as suggested in #7, which removed the mismatch warnings, but the missing and unexpected key warnings are still there. I am still getting a segfault.
Hi, I get a runtime error with the same message:
RuntimeError: Error(s) in loading state_dict for SuperResModel: Missing key(s) in state_dict: "input_blocks.3.0.op.weight", "input_blocks.3.0.op.bias", "input_blocks.6.0.op.weight", "input_blocks.6.0.op.bias", "input_blocks.9.0.op.weight", "input_blocks.9.0.op.bias", "input_blocks.12.0.op.weight", "input_blocks.12.0.op.bias", "input_blocks.15.0.op.weight", "input_blocks.15.0.op.bias", "output_blocks.2.2.conv.weight", "output_blocks.2.2.conv.bias", "output_blocks.5.2.conv.weight", "output_blocks.5.2.conv.bias", "output_blocks.8.1.conv.weight", "output_blocks.8.1.conv.bias", "output_blocks.11.1.conv.weight", "output_blocks.11.1.conv.bias", "output_blocks.14.1.conv.weight", "output_blocks.14.1.conv.bias".
Did you manage to solve it?
No, I didn't find a solution.
ok, thanks for answering!
Hi, I encountered the same problem here. I guess that the published model has a bit different architecture than the one written in the code. Can you please see if they match?
I solved it by using 'restrict=False' flag when loading the model but the results I get are really poor. I guess this is because the model was not loaded well.
Any news on this? I'm hitting the same issue.
EDIT: I was defining the environment variables the wrong way (new to Jupyter 😅 )
@acardara I'm still having other issues but, I think this might help you. From your message you seem to be running this within a Jupyter Notebook. You're currently defining the environment variables using !
which shouldn't persist to other commands.
Try using %env
like:
%env SAMPLE_FLAGS=...
!python image_sample.py ... ${SAMPLE_FLAGS}
I'm new to Jupyter and was running into a similar issue. My understanding is that !SAMPLE_FLAGS
would only work if you run the python script in the same line, similar to inline setting a var in bash.
I haven't tried but !SAMPLE_FLAGS="..." python ... $SAMPLE_FLAGS"
should work if I'm right.
I solved it by using 'restrict=False' flag when loading the model but the results I get are really poor. I guess this is because the model was not loaded well.
Hi @inbarhub I tried using restrict=False here :
model.load_state_dict( dist_util.load_state_dict(args.model_path, restrict=False) ) but it did not work
I solve the same problem!
model.load_state_dict( dist_util.load_state_dict(args.model_path, map_location="cpu"), strict=False )
@XDUWQ Can you add your contact information and ask?
Hi, I get a runtime error with the same message:
RuntimeError: Error(s) in loading state_dict for SuperResModel: Missing key(s) in state_dict: "input_blocks.3.0.op.weight", "input_blocks.3.0.op.bias", "input_blocks.6.0.op.weight", "input_blocks.6.0.op.bias", "input_blocks.9.0.op.weight", "input_blocks.9.0.op.bias", "input_blocks.12.0.op.weight", "input_blocks.12.0.op.bias", "input_blocks.15.0.op.weight", "input_blocks.15.0.op.bias", "output_blocks.2.2.conv.weight", "output_blocks.2.2.conv.bias", "output_blocks.5.2.conv.weight", "output_blocks.5.2.conv.bias", "output_blocks.8.1.conv.weight", "output_blocks.8.1.conv.bias", "output_blocks.11.1.conv.weight", "output_blocks.11.1.conv.bias", "output_blocks.14.1.conv.weight", "output_blocks.14.1.conv.bias".
Did you manage to solve it?
I encountered the same problem, how did you solve it?
I solve the same problem!
model.load_state_dict( dist_util.load_state_dict(args.model_path, map_location="cpu"), strict=False )
Your method can only solve the problem of Missing key(s), but cannot solve the problem of Size Mismatch.
@TimenoLong hello ,have you solved it?
@TimenoLong hello ,have you solved it?
This is the closest I got https://github.com/openai/guided-diffusion/issues/8#issuecomment-1139795831
Thanks for your reply @DiogoNeves
now i got a question:
this classifier:
TRAIN_FLAGS="--iterations 10000 --anneal_lr True --batch_size 32 --lr 3e-4 --save_interval 1000 --weight_decay 0.05" CLASSIFIER_FLAGS="--image_size 64 --classifier_attention_resolutions 32,16,8 --classifier_depth 2 --classifier_width 128 --classifier_pool attention --classifier_resblock_updown True --classifier_use_scale_shift_norm True"
mpiexec -n N python scripts/classifier_train.py --data_dir /home/cumt306/dingbo/DCNv4/data/train $TRAIN_FLAGS $CLASSIFIER_FLAGS
this diffusion: MODEL_FLAGS="--image_size 64 --num_channels 128 --num_res_blocks 3" DIFFUSION_FLAGS="--diffusion_steps 4000 --noise_schedule linear" TRAIN_FLAGS="--lr 1e-4 --batch_size 32"
python scripts/image_train.py --data_dir /home/cumt306/dingbo/DCNv4/data/train $MODEL_FLAGS $DIFFUSION_FLAGS $TRAIN_FLAGS
this sampling: MODEL_FLAGS="--attention_resolutions 32,16,8 --class_cond True --image_size 64 --learn_sigma True --num_channels 128 --num_heads 4 --num_res_blocks 3 --resblock_updown True --use_fp16 True --use_scale_shift_norm True" CLASSIFIER_FLAGS="--image_size 64 --classifier_attention_resolutions 32,16,8 --classifier_depth 4 --classifier_width 128 --classifier_pool attention --classifier_resblock_updown True --classifier_use_scale_shift_norm True --classifier_scale 1.0 --classifier_use_fp16 True" SAMPLE_FLAGS="--batch_size 32 --num_samples 1000 --timestep_respacing ddim25 --use_ddim True" mpiexec -n N python scripts/classifier_sample.py \ --model_path /home/cumt306/dingbo/guided-diffusion-main/diffusion_model/openai-2024-07-16-11-34-50-811594/model001000.pt\ --classifier_path /home/cumt306/dingbo/guided-diffusion-main/classer_model/openai-2024-07-16-11-28-48-194253/model001000.pt\ $MODEL_FLAGS $CLASSIFIER_FLAGS $SAMPLE_FLAGS
when i run samping : raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for UNetModel: size mismatch for out.2.weight: copying a param with shape torch.Size([3, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([6, 128, 3, 3]). size mismatch for out.2.bias: copying a param with shape torch.Size([3]) from checkpoint, the shape in current model is torch.Size([6]).
@Alexdbsdfs just looking from the phone so I cannot test.
But it looks like the train model UNet size does not match when you're setting it up to sample.
I'm guessing a lot here but check what's the default res block size for training.
I don't think you're setting it for training, but then you set it to 3 later.
A simple thing you could do as well set the blocks to 6 on the sampling.
Let me know if that helps
Thank you very much for your reply In diffusion train: i set--num_res_blocks= 3
in sampling: i set--num_res_blocks= 3 too
then i set num_res_blocks=6 ,Similar issues have also arisen. @DiogoNeves
I want to sample images from the pretrained 64x64_diffusion model but am hitting a segfault with the suggested run configuration. I've downloaded the 64x64 checkpoints to a
models
folder and am running with the following flags.!SAMPLE_FLAGS="--batch_size 4 --num_samples 100 --timestep_respacing 250"
!MODEL_FLAGS="--attention_resolutions 32,16,8 --class_cond True --diffusion_steps 1000 --dropout 0.1 --image_size 64 --learn_sigma True --noise_schedule cosine --num_channels 192 --num_head_channels 64 --num_res_blocks 3 --resblock_updown True --use_new_attention_order True --use_fp16 True --use_scale_shift_norm True"
!python image_sample.py $MODEL_FLAGS --model_path models/64x64_diffusion.pt $SAMPLE_FLAGS
At runtime, I get a slew of warnings about missing and unused keys before the code crashes via segfault:
Missing key(s) in state_dict: "input_blocks.3.0.op.weight", "input_blocks.3.0.op.bias", "input_blocks.4.0.skip_connection.weight", ..., "output_blocks.8.1.conv.bias".
Unexpected key(s) in state_dict: "label_emb.weight", "input_blocks.12.0.in_layers.0.weight", "input_blocks.12.0.in_layers.0.bias", ..., "output_blocks.11.2.out_layers.3.bias".
size mismatch for time_embed.0.weight: copying a param with shape torch.Size([768, 192]) from checkpoint, the shape in current model is torch.Size([512, 128]). ... size mismatch for out.2.bias: copying a param with shape torch.Size([6]) from checkpoint, the shape in current model is torch.Size([3]).