openai / guided-diffusion

MIT License
6.03k stars 803 forks source link

how can i train a diffusion model #60

Open No360201 opened 2 years ago

No360201 commented 2 years ago

when i use openai/improved-diffusion train my data to get a diffusion model ,i get three .pt, which one is the diffusion model? when i load the model in andreas128RePaint ,i get Missing key(s) in state_dict: "input_blocks.3.0.in_layers.0.weight", "input_blocks.3.0.in_layers.0.bias", "input_blocks.3.0.in_layers.2.weight", "input_blocks.3.0.in_layers.2.bias", "input_blocks.3.0.emb_layers.1.weight", "input_blocks.3.0.emb_layers.1.bias", "input_blocks.3.0.out_layers.0.weight", "input_blocks.3.0.out_layers.0.bias", "input_blocks.3.0.out_layers.3.weight", "input_blocks.3.0.out_layers.3.bias", "input_blocks.6.0.in_layers.0.weight", "input_blocks.6.0.in_layers.0.bias", "input_blocks.6.0.in_layers.2.weight", "input_blocks.6.0.in_layers.2.bias", "input_blocks.6.0.emb_layers.1.weight", "input_blocks.6.0.emb_layers.1.bias", "input_blocks.6.0.out_layers.0.weight", "input_blocks.6.0.out_layers.0.bias", "input_blocks.6.0.out_layers.3.weight", "input_blocks.6.0.out_layers.3.bias", "input_blocks.9.0.in_layers.0.weight", "input_blocks.9.0.in_layers.0.bias", "input_blocks.9.0.in_layers.2.weight", "input_blocks.9.0.in_layers.2.bias", "input_blocks.9.0.emb_layers.1.weight", "input_blocks.9.0.emb_layers.1.bias", "input_blocks.9.0.out_layers.0.weight", "input_blocks.9.0.out_layers.0.bias", "input_blocks.9.0.out_layers.3.weight", "input_blocks.9.0.out_layers.3.bias", "input_blocks.12.0.in_layers.0.weight", "input_blocks.12.0.in_layers.0.bias", "input_blocks.12.0.in_layers.2.weight", "input_blocks.12.0.in_layers.2.bias", "input_blocks.12.0.emb_layers.1.weight", "input_blocks.12.0.emb_layers.1.bias", "input_blocks.12.0.out_layers.0.weight", "input_blocks.12.0.out_layers.0.bias", "input_blocks.12.0.out_layers.3.weight", "input_blocks.12.0.out_layers.3.bias", "input_blocks.15.0.in_layers.0.weight", "input_blocks.15.0.in_layers.0.bias", "input_blocks.15.0.in_layers.2.weight", "input_blocks.15.0.in_layers.2.bias", "input_blocks.15.0.emb_layers.1.weight", "input_blocks.15.0.emb_layers.1.bias", "input_blocks.15.0.out_layers.0.weight", "input_blocks.15.0.out_layers.0.bias", "input_blocks.15.0.out_layers.3.weight", "input_blocks.15.0.out_layers.3.bias", "output_blocks.2.2.in_layers.0.weight", "output_blocks.2.2.in_layers.0.bias", "output_blocks.2.2.in_layers.2.weight", "output_blocks.2.2.in_layers.2.bias", "output_blocks.2.2.emb_layers.1.weight", "output_blocks.2.2.emb_layers.1.bias", "output_blocks.2.2.out_layers.0.weight", "output_blocks.2.2.out_layers.0.bias", "output_blocks.2.2.out_layers.3.weight", "output_blocks.2.2.out_layers.3.bias", "output_blocks.5.2.in_layers.0.weight", "output_blocks.5.2.in_layers.0.bias", "output_blocks.5.2.in_layers.2.weight", "output_blocks.5.2.in_layers.2.bias", "output_blocks.5.2.emb_layers.1.weight", "output_blocks.5.2.emb_layers.1.bias", "output_blocks.5.2.out_layers.0.weight", "output_blocks.5.2.out_layers.0.bias", "output_blocks.5.2.out_layers.3.weight", "output_blocks.5.2.out_layers.3.bias", "output_blocks.8.2.in_layers.0.weight", "output_blocks.8.2.in_layers.0.bias", "output_blocks.8.2.in_layers.2.weight", "output_blocks.8.2.in_layers.2.bias", "output_blocks.8.2.emb_layers.1.weight", "output_blocks.8.2.emb_layers.1.bias", "output_blocks.8.2.out_layers.0.weight", "output_blocks.8.2.out_layers.0.bias", "output_blocks.8.2.out_layers.3.weight", "output_blocks.8.2.out_layers.3.bias", "output_blocks.11.1.in_layers.0.weight", "output_blocks.11.1.in_layers.0.bias", "output_blocks.11.1.in_layers.2.weight", "output_blocks.11.1.in_layers.2.bias", "output_blocks.11.1.emb_layers.1.weight", "output_blocks.11.1.emb_layers.1.bias", "output_blocks.11.1.out_layers.0.weight", "output_blocks.11.1.out_layers.0.bias", "output_blocks.11.1.out_layers.3.weight", "output_blocks.11.1.out_layers.3.bias", "output_blocks.14.1.in_layers.0.weight", "output_blocks.14.1.in_layers.0.bias", "output_blocks.14.1.in_layers.2.weight", "output_blocks.14.1.in_layers.2.bias", "output_blocks.14.1.emb_layers.1.weight", "output_blocks.14.1.emb_layers.1.bias", "output_blocks.14.1.out_layers.0.weight", "output_blocks.14.1.out_layers.0.bias", "output_blocks.14.1.out_layers.3.weight", "output_blocks.14.1.out_layers.3.bias". Unexpected key(s) in state_dict: "input_blocks.3.0.op.weight", "input_blocks.3.0.op.bias", "input_blocks.6.0.op.weight", "input_blocks.6.0.op.bias", "input_blocks.9.0.op.weight", "input_blocks.9.0.op.bias", "input_blocks.12.0.op.weight", "input_blocks.12.0.op.bias", "input_blocks.15.0.op.weight", "input_blocks.15.0.op.bias", "output_blocks.2.2.conv.weight", "output_blocks.2.2.conv.bias", "output_blocks.5.2.conv.weight", "output_blocks.5.2.conv.bias", "output_blocks.8.2.conv.weight", "output_blocks.8.2.conv.bias", "output_blocks.11.1.conv.weight", "output_blocks.11.1.conv.bias", "output_blocks.14.1.conv.weight", "output_blocks.14.1.conv.bias".

No360201 commented 2 years ago

@adam-openai @aluo-openai

pokameng commented 1 year ago

hi have you solved this problem? I meet this problem too!!! @No360201

No360201 commented 1 year ago

hi have you solved this problem? I meet this problem too!!! @No360201 i can train now , do you have wechat

pokameng commented 1 year ago

My we chat NLG-wsm @No360201

pokameng commented 1 year ago

We can chat with each other in wechat and my wechat is NLG-wsm

lin-tianyu commented 1 year ago

Hey guys! I am now encountering the same problem. Can you share the solution with me? @pokameng @No360201

FrozenSeas commented 1 year ago

I am encountering the same problem. Did you guys find out how to solve this problem?

lin-tianyu commented 1 year ago

I trained a diffusion model base on guided-diffusion, rather than 'improved-diffusion', and this problem was solved. I think this issue might due to the different setting of diffusion model between improved-diffusion and guided-diffusion.

ONobody commented 1 year ago

@lin-tianyu Hello, may I add your contact information and ask some training questions?

lin-tianyu commented 1 year ago

@ONobody Of course, you can contact me via my email: linty6@mail2.sysu.edu.cn

zhangbaijin commented 1 year ago

Hi guys! I am encountering the same problem. Can you share the solution with me? @pokameng @No360201 @lin-tianyu My command for 256X256 dataset is: MODEL_FLAGS="--image_size 256 --num_channels 256 --num_res_blocks 2 --num_heads 4 --learn_sigma True --use_scale_shift_norm true" DIFFUSION_FLAGS="--diffusion_steps 1000 --noise_schedule linear --rescale_learned_sigmas False" TRAIN_FLAGS="--lr 1e-4 --microbatch 4 --dropout 0.1" and the checkpoint of is 1.9G, but i notice that repaint's checkpoint is 2.1G,how shoud i channge it? Can you add my Wechat:SemiMobile?

zhangbaijin commented 1 year ago

The problem is solved, thanks,guys.

octadion commented 1 year ago

@zhangbaijin excuse me sir, can you tell me how to solve it, because i seem to be having the same problem

xyz-xdx commented 1 year ago

@zhangbaijin @pokameng @lin-tianyu Hi guys! I am encountering the same problem. Can you share the solution with me? Ask about the weight mismatch and NaN problem during model training.

daisybby commented 1 year ago

I have solved this problem!I need to train guided diffusion for repaint using my own dataset.But I ignored these hyperparameters .All hyperparameters must be consistent with repaint(stored in the YAML file), and you can preliminarily judge whether they are consistent by the size of the checkpoint. If they are inconsistent, load_state_dict will report an error.

hzy-del commented 7 months ago

I have solved this problem!I need to train guided diffusion for repaint using my own dataset.But I ignored these hyperparameters .All hyperparameters must be consistent with repaint(stored in the YAML file), and you can preliminarily judge whether they are consistent by the size of the checkpoint. If they are inconsistent, load_state_dict will report an error.

hello,can you provide the train file and related configs?

Joseph-Mulenga commented 6 months ago

@daisybby > I have solved this problem!I need to train guided diffusion for repaint using my own dataset.But I ignored these hyperparameters .All hyperparameters must be consistent with repaint(stored in the YAML file), and you can preliminarily judge whether they are consistent by the size of the checkpoint. If they are inconsistent, load_state_dict will report an error.

Hello can you please help me on how to train guided diffusion for repaint. I'm tryna train with my own data for repaint.

zhangbaijin commented 6 months ago

MODEL_FLAGS="--image_size 256 --attention_resolutions 32,16,8 --num_channels 256 --num_head_channels 64 --num_res_blocks 2 --num_heads 4 --resblock_updown true --learn_sigma True --use_scale_shift_norm true --learn_sigma true --timestep_respacing 250 --use_fp16 false --use_kl false " DIFFUSION_FLAGS="--diffusion_steps 1000 --noise_schedule linear --rescale_learned_sigmas False" TRAIN_FLAGS="--lr 1e-4 --microbatch 4 --dropout 0.0"