mingyuan-zhang / FineMoGen

[NeurIPS 2023] FineMoGen: Fine-Grained Spatio-Temporal Motion Generation and Editing
Other
105 stars 8 forks source link

Error loading model checkpoint weights #3

Open connorzl opened 8 months ago

connorzl commented 8 months ago

Hi, after following the setup instructions, I tested the model by running the following command:

PYTHONPATH=".":$PYTHONPATH python tools/visualize.py configs/finemogen/finemogen_t2m.py logs/finemogen/finemogen_t2m/latest.pth --text "a person is running quickly" --motion_length 120 --out "test.gif"

I received the following warnings below:

warnings.warn( load checkpoint from local path: logs/finemogen/finemogen_t2m/latest.pth The model and loaded state dict do not match exactly

missing keys in source state_dict: model.clip.positional_embedding, model.clip.text_projection, model.clip.logit_scale, model.clip.visual.class_embedding, model.clip.visual.positional_embedding, model.clip.visual.proj, model.clip.visual.conv1.weight, model.clip.visual.ln_pre.weight, model.clip.visual.ln_pre.bias, model.clip.visual.transformer.resblocks.0.attn.in_proj_weight, model.clip.visual.transformer.resblocks.0.attn.in_proj_bias, model.clip.visual.transformer.resblocks.0.attn.out_proj.weight, model.clip.visual.transformer.resblocks.0.attn.out_proj.bias, model.clip.visual.transformer.resblocks.0.ln_1.weight, model.clip.visual.transformer.resblocks.0.ln_1.bias, model.clip.visual.transformer.resblocks.0.mlp.c_fc.weight, model.clip.visual.transformer.resblocks.0.mlp.c_fc.bias, model.clip.visual.transformer.resblocks.0.mlp.c_proj.weight, model.clip.visual.transformer.resblocks.0.mlp.c_proj.bias, model.clip.visual.transformer.resblocks.0.ln_2.weight, model.clip.visual.transformer.resblocks.0.ln_2.bias, model.clip.visual.transformer.resblocks.1.attn.in_proj_weight, model.clip.visual.transformer.resblocks.1.attn.in_proj_bias, model.clip.visual.transformer.resblocks.1.attn.out_proj.weight, model.clip.visual.transformer.resblocks.1.attn.out_proj.bias, model.clip.visual.transformer.resblocks.1.ln_1.weight, model.clip.visual.transformer.resblocks.1.ln_1.bias, model.clip.visual.transformer.resblocks.1.mlp.c_fc.weight, model.clip.visual.transformer.resblocks.1.mlp.c_fc.bias, model.clip.visual.transformer.resblocks.1.mlp.c_proj.weight, model.clip.visual.transformer.resblocks.1.mlp.c_proj.bias, model.clip.visual.transformer.resblocks.1.ln_2.weight, model.clip.visual.transformer.resblocks.1.ln_2.bias, model.clip.visual.transformer.resblocks.2.attn.in_proj_weight, model.clip.visual.transformer.resblocks.2.attn.in_proj_bias, model.clip.visual.transformer.resblocks.2.attn.out_proj.weight, model.clip.visual.transformer.resblocks.2.attn.out_proj.bias, model.clip.visual.transformer.resblocks.2.ln_1.weight, model.clip.visual.transformer.resblocks.2.ln_1.bias, model.clip.visual.transformer.resblocks.2.mlp.c_fc.weight, model.clip.visual.transformer.resblocks.2.mlp.c_fc.bias, model.clip.visual.transformer.resblocks.2.mlp.c_proj.weight, model.clip.visual.transformer.resblocks.2.mlp.c_proj.bias, model.clip.visual.transformer.resblocks.2.ln_2.weight, model.clip.visual.transformer.resblocks.2.ln_2.bias, model.clip.visual.transformer.resblocks.3.attn.in_proj_weight, model.clip.visual.transformer.resblocks.3.attn.in_proj_bias, model.clip.visual.transformer.resblocks.3.attn.out_proj.weight, model.clip.visual.transformer.resblocks.3.attn.out_proj.bias, model.clip.visual.transformer.resblocks.3.ln_1.weight, model.clip.visual.transformer.resblocks.3.ln_1.bias, model.clip.visual.transformer.resblocks.3.mlp.c_fc.weight, model.clip.visual.transformer.resblocks.3.mlp.c_fc.bias, model.clip.visual.transformer.resblocks.3.mlp.c_proj.weight, model.clip.visual.transformer.resblocks.3.mlp.c_proj.bias, model.clip.visual.transformer.resblocks.3.ln_2.weight, model.clip.visual.transformer.resblocks.3.ln_2.bias, model.clip.visual.transformer.resblocks.4.attn.in_proj_weight, model.clip.visual.transformer.resblocks.4.attn.in_proj_bias, model.clip.visual.transformer.resblocks.4.attn.out_proj.weight, model.clip.visual.transformer.resblocks.4.attn.out_proj.bias, model.clip.visual.transformer.resblocks.4.ln_1.weight, model.clip.visual.transformer.resblocks.4.ln_1.bias, model.clip.visual.transformer.resblocks.4.mlp.c_fc.weight, model.clip.visual.transformer.resblocks.4.mlp.c_fc.bias, model.clip.visual.transformer.resblocks.4.mlp.c_proj.weight, model.clip.visual.transformer.resblocks.4.mlp.c_proj.bias, model.clip.visual.transformer.resblocks.4.ln_2.weight, model.clip.visual.transformer.resblocks.4.ln_2.bias, model.clip.visual.transformer.resblocks.5.attn.in_proj_weight, model.clip.visual.transformer.resblocks.5.attn.in_proj_bias, model.clip.visual.transformer.resblocks.5.attn.out_proj.weight, model.clip.visual.transformer.resblocks.5.attn.out_proj.bias, model.clip.visual.transformer.resblocks.5.ln_1.weight, model.clip.visual.transformer.resblocks.5.ln_1.bias, model.clip.visual.transformer.resblocks.5.mlp.c_fc.weight, model.clip.visual.transformer.resblocks.5.mlp.c_fc.bias, model.clip.visual.transformer.resblocks.5.mlp.c_proj.weight, model.clip.visual.transformer.resblocks.5.mlp.c_proj.bias, model.clip.visual.transformer.resblocks.5.ln_2.weight, model.clip.visual.transformer.resblocks.5.ln_2.bias, model.clip.visual.transformer.resblocks.6.attn.in_proj_weight, model.clip.visual.transformer.resblocks.6.attn.in_proj_bias, model.clip.visual.transformer.resblocks.6.attn.out_proj.weight, model.clip.visual.transformer.resblocks.6.attn.out_proj.bias, model.clip.visual.transformer.resblocks.6.ln_1.weight, model.clip.visual.transformer.resblocks.6.ln_1.bias, model.clip.visual.transformer.resblocks.6.mlp.c_fc.weight, model.clip.visual.transformer.resblocks.6.mlp.c_fc.bias, model.clip.visual.transformer.resblocks.6.mlp.c_proj.weight, model.clip.visual.transformer.resblocks.6.mlp.c_proj.bias, model.clip.visual.transformer.resblocks.6.ln_2.weight, model.clip.visual.transformer.resblocks.6.ln_2.bias, model.clip.visual.transformer.resblocks.7.attn.in_proj_weight, model.clip.visual.transformer.resblocks.7.attn.in_proj_bias, model.clip.visual.transformer.resblocks.7.attn.out_proj.weight, model.clip.visual.transformer.resblocks.7.attn.out_proj.bias, model.clip.visual.transformer.resblocks.7.ln_1.weight, model.clip.visual.transformer.resblocks.7.ln_1.bias, model.clip.visual.transformer.resblocks.7.mlp.c_fc.weight, model.clip.visual.transformer.resblocks.7.mlp.c_fc.bias, model.clip.visual.transformer.resblocks.7.mlp.c_proj.weight, model.clip.visual.transformer.resblocks.7.mlp.c_proj.bias, model.clip.visual.transformer.resblocks.7.ln_2.weight, model.clip.visual.transformer.resblocks.7.ln_2.bias, model.clip.visual.transformer.resblocks.8.attn.in_proj_weight, model.clip.visual.transformer.resblocks.8.attn.in_proj_bias, model.clip.visual.transformer.resblocks.8.attn.out_proj.weight, model.clip.visual.transformer.resblocks.8.attn.out_proj.bias, model.clip.visual.transformer.resblocks.8.ln_1.weight, model.clip.visual.transformer.resblocks.8.ln_1.bias, model.clip.visual.transformer.resblocks.8.mlp.c_fc.weight, model.clip.visual.transformer.resblocks.8.mlp.c_fc.bias, model.clip.visual.transformer.resblocks.8.mlp.c_proj.weight, model.clip.visual.transformer.resblocks.8.mlp.c_proj.bias, model.clip.visual.transformer.resblocks.8.ln_2.weight, model.clip.visual.transformer.resblocks.8.ln_2.bias, model.clip.visual.transformer.resblocks.9.attn.in_proj_weight, model.clip.visual.transformer.resblocks.9.attn.in_proj_bias, model.clip.visual.transformer.resblocks.9.attn.out_proj.weight, model.clip.visual.transformer.resblocks.9.attn.out_proj.bias, model.clip.visual.transformer.resblocks.9.ln_1.weight, model.clip.visual.transformer.resblocks.9.ln_1.bias, model.clip.visual.transformer.resblocks.9.mlp.c_fc.weight, model.clip.visual.transformer.resblocks.9.mlp.c_fc.bias, model.clip.visual.transformer.resblocks.9.mlp.c_proj.weight, model.clip.visual.transformer.resblocks.9.mlp.c_proj.bias, model.clip.visual.transformer.resblocks.9.ln_2.weight, model.clip.visual.transformer.resblocks.9.ln_2.bias, model.clip.visual.transformer.resblocks.10.attn.in_proj_weight, model.clip.visual.transformer.resblocks.10.attn.in_proj_bias, model.clip.visual.transformer.resblocks.10.attn.out_proj.weight, model.clip.visual.transformer.resblocks.10.attn.out_proj.bias, model.clip.visual.transformer.resblocks.10.ln_1.weight, model.clip.visual.transformer.resblocks.10.ln_1.bias, model.clip.visual.transformer.resblocks.10.mlp.c_fc.weight, model.clip.visual.transformer.resblocks.10.mlp.c_fc.bias, model.clip.visual.transformer.resblocks.10.mlp.c_proj.weight, model.clip.visual.transformer.resblocks.10.mlp.c_proj.bias, model.clip.visual.transformer.resblocks.10.ln_2.weight, model.clip.visual.transformer.resblocks.10.ln_2.bias, model.clip.visual.transformer.resblocks.11.attn.in_proj_weight, model.clip.visual.transformer.resblocks.11.attn.in_proj_bias, model.clip.visual.transformer.resblocks.11.attn.out_proj.weight, model.clip.visual.transformer.resblocks.11.attn.out_proj.bias, model.clip.visual.transformer.resblocks.11.ln_1.weight, model.clip.visual.transformer.resblocks.11.ln_1.bias, model.clip.visual.transformer.resblocks.11.mlp.c_fc.weight, model.clip.visual.transformer.resblocks.11.mlp.c_fc.bias, model.clip.visual.transformer.resblocks.11.mlp.c_proj.weight, model.clip.visual.transformer.resblocks.11.mlp.c_proj.bias, model.clip.visual.transformer.resblocks.11.ln_2.weight, model.clip.visual.transformer.resblocks.11.ln_2.bias, model.clip.visual.ln_post.weight, model.clip.visual.ln_post.bias, model.clip.transformer.resblocks.0.attn.in_proj_weight, model.clip.transformer.resblocks.0.attn.in_proj_bias, model.clip.transformer.resblocks.0.attn.out_proj.weight, model.clip.transformer.resblocks.0.attn.out_proj.bias, model.clip.transformer.resblocks.0.ln_1.weight, model.clip.transformer.resblocks.0.ln_1.bias, model.clip.transformer.resblocks.0.mlp.c_fc.weight, model.clip.transformer.resblocks.0.mlp.c_fc.bias, model.clip.transformer.resblocks.0.mlp.c_proj.weight, model.clip.transformer.resblocks.0.mlp.c_proj.bias, model.clip.transformer.resblocks.0.ln_2.weight, model.clip.transformer.resblocks.0.ln_2.bias, model.clip.transformer.resblocks.1.attn.in_proj_weight, model.clip.transformer.resblocks.1.attn.in_proj_bias, model.clip.transformer.resblocks.1.attn.out_proj.weight, model.clip.transformer.resblocks.1.attn.out_proj.bias, model.clip.transformer.resblocks.1.ln_1.weight, model.clip.transformer.resblocks.1.ln_1.bias, model.clip.transformer.resblocks.1.mlp.c_fc.weight, model.clip.transformer.resblocks.1.mlp.c_fc.bias, model.clip.transformer.resblocks.1.mlp.c_proj.weight, model.clip.transformer.resblocks.1.mlp.c_proj.bias, model.clip.transformer.resblocks.1.ln_2.weight, model.clip.transformer.resblocks.1.ln_2.bias, model.clip.transformer.resblocks.2.attn.in_proj_weight, model.clip.transformer.resblocks.2.attn.in_proj_bias, model.clip.transformer.resblocks.2.attn.out_proj.weight, model.clip.transformer.resblocks.2.attn.out_proj.bias, model.clip.transformer.resblocks.2.ln_1.weight, model.clip.transformer.resblocks.2.ln_1.bias, model.clip.transformer.resblocks.2.mlp.c_fc.weight, model.clip.transformer.resblocks.2.mlp.c_fc.bias, model.clip.transformer.resblocks.2.mlp.c_proj.weight, model.clip.transformer.resblocks.2.mlp.c_proj.bias, model.clip.transformer.resblocks.2.ln_2.weight, model.clip.transformer.resblocks.2.ln_2.bias, model.clip.transformer.resblocks.3.attn.in_proj_weight, model.clip.transformer.resblocks.3.attn.in_proj_bias, model.clip.transformer.resblocks.3.attn.out_proj.weight, model.clip.transformer.resblocks.3.attn.out_proj.bias, model.clip.transformer.resblocks.3.ln_1.weight, model.clip.transformer.resblocks.3.ln_1.bias, model.clip.transformer.resblocks.3.mlp.c_fc.weight, model.clip.transformer.resblocks.3.mlp.c_fc.bias, model.clip.transformer.resblocks.3.mlp.c_proj.weight, model.clip.transformer.resblocks.3.mlp.c_proj.bias, model.clip.transformer.resblocks.3.ln_2.weight, model.clip.transformer.resblocks.3.ln_2.bias, model.clip.transformer.resblocks.4.attn.in_proj_weight, model.clip.transformer.resblocks.4.attn.in_proj_bias, model.clip.transformer.resblocks.4.attn.out_proj.weight, model.clip.transformer.resblocks.4.attn.out_proj.bias, model.clip.transformer.resblocks.4.ln_1.weight, model.clip.transformer.resblocks.4.ln_1.bias, model.clip.transformer.resblocks.4.mlp.c_fc.weight, model.clip.transformer.resblocks.4.mlp.c_fc.bias, model.clip.transformer.resblocks.4.mlp.c_proj.weight, model.clip.transformer.resblocks.4.mlp.c_proj.bias, model.clip.transformer.resblocks.4.ln_2.weight, model.clip.transformer.resblocks.4.ln_2.bias, model.clip.transformer.resblocks.5.attn.in_proj_weight, model.clip.transformer.resblocks.5.attn.in_proj_bias, model.clip.transformer.resblocks.5.attn.out_proj.weight, model.clip.transformer.resblocks.5.attn.out_proj.bias, model.clip.transformer.resblocks.5.ln_1.weight, model.clip.transformer.resblocks.5.ln_1.bias, model.clip.transformer.resblocks.5.mlp.c_fc.weight, model.clip.transformer.resblocks.5.mlp.c_fc.bias, model.clip.transformer.resblocks.5.mlp.c_proj.weight, model.clip.transformer.resblocks.5.mlp.c_proj.bias, model.clip.transformer.resblocks.5.ln_2.weight, model.clip.transformer.resblocks.5.ln_2.bias, model.clip.transformer.resblocks.6.attn.in_proj_weight, model.clip.transformer.resblocks.6.attn.in_proj_bias, model.clip.transformer.resblocks.6.attn.out_proj.weight, model.clip.transformer.resblocks.6.attn.out_proj.bias, model.clip.transformer.resblocks.6.ln_1.weight, model.clip.transformer.resblocks.6.ln_1.bias, model.clip.transformer.resblocks.6.mlp.c_fc.weight, model.clip.transformer.resblocks.6.mlp.c_fc.bias, model.clip.transformer.resblocks.6.mlp.c_proj.weight, model.clip.transformer.resblocks.6.mlp.c_proj.bias, model.clip.transformer.resblocks.6.ln_2.weight, model.clip.transformer.resblocks.6.ln_2.bias, model.clip.transformer.resblocks.7.attn.in_proj_weight, model.clip.transformer.resblocks.7.attn.in_proj_bias, model.clip.transformer.resblocks.7.attn.out_proj.weight, model.clip.transformer.resblocks.7.attn.out_proj.bias, model.clip.transformer.resblocks.7.ln_1.weight, model.clip.transformer.resblocks.7.ln_1.bias, model.clip.transformer.resblocks.7.mlp.c_fc.weight, model.clip.transformer.resblocks.7.mlp.c_fc.bias, model.clip.transformer.resblocks.7.mlp.c_proj.weight, model.clip.transformer.resblocks.7.mlp.c_proj.bias, model.clip.transformer.resblocks.7.ln_2.weight, model.clip.transformer.resblocks.7.ln_2.bias, model.clip.transformer.resblocks.8.attn.in_proj_weight, model.clip.transformer.resblocks.8.attn.in_proj_bias, model.clip.transformer.resblocks.8.attn.out_proj.weight, model.clip.transformer.resblocks.8.attn.out_proj.bias, model.clip.transformer.resblocks.8.ln_1.weight, model.clip.transformer.resblocks.8.ln_1.bias, model.clip.transformer.resblocks.8.mlp.c_fc.weight, model.clip.transformer.resblocks.8.mlp.c_fc.bias, model.clip.transformer.resblocks.8.mlp.c_proj.weight, model.clip.transformer.resblocks.8.mlp.c_proj.bias, model.clip.transformer.resblocks.8.ln_2.weight, model.clip.transformer.resblocks.8.ln_2.bias, model.clip.transformer.resblocks.9.attn.in_proj_weight, model.clip.transformer.resblocks.9.attn.in_proj_bias, model.clip.transformer.resblocks.9.attn.out_proj.weight, model.clip.transformer.resblocks.9.attn.out_proj.bias, model.clip.transformer.resblocks.9.ln_1.weight, model.clip.transformer.resblocks.9.ln_1.bias, model.clip.transformer.resblocks.9.mlp.c_fc.weight, model.clip.transformer.resblocks.9.mlp.c_fc.bias, model.clip.transformer.resblocks.9.mlp.c_proj.weight, model.clip.transformer.resblocks.9.mlp.c_proj.bias, model.clip.transformer.resblocks.9.ln_2.weight, model.clip.transformer.resblocks.9.ln_2.bias, model.clip.transformer.resblocks.10.attn.in_proj_weight, model.clip.transformer.resblocks.10.attn.in_proj_bias, model.clip.transformer.resblocks.10.attn.out_proj.weight, model.clip.transformer.resblocks.10.attn.out_proj.bias, model.clip.transformer.resblocks.10.ln_1.weight, model.clip.transformer.resblocks.10.ln_1.bias, model.clip.transformer.resblocks.10.mlp.c_fc.weight, model.clip.transformer.resblocks.10.mlp.c_fc.bias, model.clip.transformer.resblocks.10.mlp.c_proj.weight, model.clip.transformer.resblocks.10.mlp.c_proj.bias, model.clip.transformer.resblocks.10.ln_2.weight, model.clip.transformer.resblocks.10.ln_2.bias, model.clip.transformer.resblocks.11.attn.in_proj_weight, model.clip.transformer.resblocks.11.attn.in_proj_bias, model.clip.transformer.resblocks.11.attn.out_proj.weight, model.clip.transformer.resblocks.11.attn.out_proj.bias, model.clip.transformer.resblocks.11.ln_1.weight, model.clip.transformer.resblocks.11.ln_1.bias, model.clip.transformer.resblocks.11.mlp.c_fc.weight, model.clip.transformer.resblocks.11.mlp.c_fc.bias, model.clip.transformer.resblocks.11.mlp.c_proj.weight, model.clip.transformer.resblocks.11.mlp.c_proj.bias, model.clip.transformer.resblocks.11.ln_2.weight, model.clip.transformer.resblocks.11.ln_2.bias, model.clip.token_embedding.weight, model.clip.ln_final.weight, model.clip.ln_final.bias

I also had to comment out the following lines to get the command to run, otherwise I would get plotting errors.

However, the output seems to be a blank gif with a text caption. test

Sarah816 commented 6 months ago

Hi @connorzl ! I faced the same problem when testing the model. Have you solved the problem? Many thanks.

mingyuan-zhang commented 5 months ago

Hi, after following the setup instructions, I tested the model by running the following command:

PYTHONPATH=".":$PYTHONPATH python tools/visualize.py configs/finemogen/finemogen_t2m.py logs/finemogen/finemogen_t2m/latest.pth --text "a person is running quickly" --motion_length 120 --out "test.gif"

I received the following warnings below:

warnings.warn( load checkpoint from local path: logs/finemogen/finemogen_t2m/latest.pth The model and loaded state dict do not match exactly

missing keys in source state_dict: model.clip.positional_embedding, model.clip.text_projection, model.clip.logit_scale, model.clip.visual.class_embedding, model.clip.visual.positional_embedding, model.clip.visual.proj, model.clip.visual.conv1.weight, model.clip.visual.ln_pre.weight, model.clip.visual.ln_pre.bias, model.clip.visual.transformer.resblocks.0.attn.in_proj_weight, model.clip.visual.transformer.resblocks.0.attn.in_proj_bias, model.clip.visual.transformer.resblocks.0.attn.out_proj.weight, model.clip.visual.transformer.resblocks.0.attn.out_proj.bias, model.clip.visual.transformer.resblocks.0.ln_1.weight, model.clip.visual.transformer.resblocks.0.ln_1.bias, model.clip.visual.transformer.resblocks.0.mlp.c_fc.weight, model.clip.visual.transformer.resblocks.0.mlp.c_fc.bias, model.clip.visual.transformer.resblocks.0.mlp.c_proj.weight, model.clip.visual.transformer.resblocks.0.mlp.c_proj.bias, model.clip.visual.transformer.resblocks.0.ln_2.weight, model.clip.visual.transformer.resblocks.0.ln_2.bias, model.clip.visual.transformer.resblocks.1.attn.in_proj_weight, model.clip.visual.transformer.resblocks.1.attn.in_proj_bias, model.clip.visual.transformer.resblocks.1.attn.out_proj.weight, model.clip.visual.transformer.resblocks.1.attn.out_proj.bias, model.clip.visual.transformer.resblocks.1.ln_1.weight, model.clip.visual.transformer.resblocks.1.ln_1.bias, model.clip.visual.transformer.resblocks.1.mlp.c_fc.weight, model.clip.visual.transformer.resblocks.1.mlp.c_fc.bias, model.clip.visual.transformer.resblocks.1.mlp.c_proj.weight, model.clip.visual.transformer.resblocks.1.mlp.c_proj.bias, model.clip.visual.transformer.resblocks.1.ln_2.weight, model.clip.visual.transformer.resblocks.1.ln_2.bias, model.clip.visual.transformer.resblocks.2.attn.in_proj_weight, model.clip.visual.transformer.resblocks.2.attn.in_proj_bias, model.clip.visual.transformer.resblocks.2.attn.out_proj.weight, model.clip.visual.transformer.resblocks.2.attn.out_proj.bias, model.clip.visual.transformer.resblocks.2.ln_1.weight, model.clip.visual.transformer.resblocks.2.ln_1.bias, model.clip.visual.transformer.resblocks.2.mlp.c_fc.weight, model.clip.visual.transformer.resblocks.2.mlp.c_fc.bias, model.clip.visual.transformer.resblocks.2.mlp.c_proj.weight, model.clip.visual.transformer.resblocks.2.mlp.c_proj.bias, model.clip.visual.transformer.resblocks.2.ln_2.weight, model.clip.visual.transformer.resblocks.2.ln_2.bias, model.clip.visual.transformer.resblocks.3.attn.in_proj_weight, model.clip.visual.transformer.resblocks.3.attn.in_proj_bias, model.clip.visual.transformer.resblocks.3.attn.out_proj.weight, model.clip.visual.transformer.resblocks.3.attn.out_proj.bias, model.clip.visual.transformer.resblocks.3.ln_1.weight, model.clip.visual.transformer.resblocks.3.ln_1.bias, model.clip.visual.transformer.resblocks.3.mlp.c_fc.weight, model.clip.visual.transformer.resblocks.3.mlp.c_fc.bias, model.clip.visual.transformer.resblocks.3.mlp.c_proj.weight, model.clip.visual.transformer.resblocks.3.mlp.c_proj.bias, model.clip.visual.transformer.resblocks.3.ln_2.weight, model.clip.visual.transformer.resblocks.3.ln_2.bias, model.clip.visual.transformer.resblocks.4.attn.in_proj_weight, model.clip.visual.transformer.resblocks.4.attn.in_proj_bias, model.clip.visual.transformer.resblocks.4.attn.out_proj.weight, model.clip.visual.transformer.resblocks.4.attn.out_proj.bias, model.clip.visual.transformer.resblocks.4.ln_1.weight, model.clip.visual.transformer.resblocks.4.ln_1.bias, model.clip.visual.transformer.resblocks.4.mlp.c_fc.weight, model.clip.visual.transformer.resblocks.4.mlp.c_fc.bias, model.clip.visual.transformer.resblocks.4.mlp.c_proj.weight, model.clip.visual.transformer.resblocks.4.mlp.c_proj.bias, model.clip.visual.transformer.resblocks.4.ln_2.weight, model.clip.visual.transformer.resblocks.4.ln_2.bias, model.clip.visual.transformer.resblocks.5.attn.in_proj_weight, model.clip.visual.transformer.resblocks.5.attn.in_proj_bias, model.clip.visual.transformer.resblocks.5.attn.out_proj.weight, model.clip.visual.transformer.resblocks.5.attn.out_proj.bias, model.clip.visual.transformer.resblocks.5.ln_1.weight, model.clip.visual.transformer.resblocks.5.ln_1.bias, model.clip.visual.transformer.resblocks.5.mlp.c_fc.weight, model.clip.visual.transformer.resblocks.5.mlp.c_fc.bias, model.clip.visual.transformer.resblocks.5.mlp.c_proj.weight, model.clip.visual.transformer.resblocks.5.mlp.c_proj.bias, model.clip.visual.transformer.resblocks.5.ln_2.weight, model.clip.visual.transformer.resblocks.5.ln_2.bias, model.clip.visual.transformer.resblocks.6.attn.in_proj_weight, model.clip.visual.transformer.resblocks.6.attn.in_proj_bias, model.clip.visual.transformer.resblocks.6.attn.out_proj.weight, model.clip.visual.transformer.resblocks.6.attn.out_proj.bias, model.clip.visual.transformer.resblocks.6.ln_1.weight, model.clip.visual.transformer.resblocks.6.ln_1.bias, model.clip.visual.transformer.resblocks.6.mlp.c_fc.weight, model.clip.visual.transformer.resblocks.6.mlp.c_fc.bias, model.clip.visual.transformer.resblocks.6.mlp.c_proj.weight, model.clip.visual.transformer.resblocks.6.mlp.c_proj.bias, model.clip.visual.transformer.resblocks.6.ln_2.weight, model.clip.visual.transformer.resblocks.6.ln_2.bias, model.clip.visual.transformer.resblocks.7.attn.in_proj_weight, model.clip.visual.transformer.resblocks.7.attn.in_proj_bias, model.clip.visual.transformer.resblocks.7.attn.out_proj.weight, model.clip.visual.transformer.resblocks.7.attn.out_proj.bias, model.clip.visual.transformer.resblocks.7.ln_1.weight, model.clip.visual.transformer.resblocks.7.ln_1.bias, model.clip.visual.transformer.resblocks.7.mlp.c_fc.weight, model.clip.visual.transformer.resblocks.7.mlp.c_fc.bias, model.clip.visual.transformer.resblocks.7.mlp.c_proj.weight, model.clip.visual.transformer.resblocks.7.mlp.c_proj.bias, model.clip.visual.transformer.resblocks.7.ln_2.weight, model.clip.visual.transformer.resblocks.7.ln_2.bias, model.clip.visual.transformer.resblocks.8.attn.in_proj_weight, model.clip.visual.transformer.resblocks.8.attn.in_proj_bias, model.clip.visual.transformer.resblocks.8.attn.out_proj.weight, model.clip.visual.transformer.resblocks.8.attn.out_proj.bias, model.clip.visual.transformer.resblocks.8.ln_1.weight, model.clip.visual.transformer.resblocks.8.ln_1.bias, model.clip.visual.transformer.resblocks.8.mlp.c_fc.weight, model.clip.visual.transformer.resblocks.8.mlp.c_fc.bias, model.clip.visual.transformer.resblocks.8.mlp.c_proj.weight, model.clip.visual.transformer.resblocks.8.mlp.c_proj.bias, model.clip.visual.transformer.resblocks.8.ln_2.weight, model.clip.visual.transformer.resblocks.8.ln_2.bias, model.clip.visual.transformer.resblocks.9.attn.in_proj_weight, model.clip.visual.transformer.resblocks.9.attn.in_proj_bias, model.clip.visual.transformer.resblocks.9.attn.out_proj.weight, model.clip.visual.transformer.resblocks.9.attn.out_proj.bias, model.clip.visual.transformer.resblocks.9.ln_1.weight, model.clip.visual.transformer.resblocks.9.ln_1.bias, model.clip.visual.transformer.resblocks.9.mlp.c_fc.weight, model.clip.visual.transformer.resblocks.9.mlp.c_fc.bias, model.clip.visual.transformer.resblocks.9.mlp.c_proj.weight, model.clip.visual.transformer.resblocks.9.mlp.c_proj.bias, model.clip.visual.transformer.resblocks.9.ln_2.weight, model.clip.visual.transformer.resblocks.9.ln_2.bias, model.clip.visual.transformer.resblocks.10.attn.in_proj_weight, model.clip.visual.transformer.resblocks.10.attn.in_proj_bias, model.clip.visual.transformer.resblocks.10.attn.out_proj.weight, model.clip.visual.transformer.resblocks.10.attn.out_proj.bias, model.clip.visual.transformer.resblocks.10.ln_1.weight, model.clip.visual.transformer.resblocks.10.ln_1.bias, model.clip.visual.transformer.resblocks.10.mlp.c_fc.weight, model.clip.visual.transformer.resblocks.10.mlp.c_fc.bias, model.clip.visual.transformer.resblocks.10.mlp.c_proj.weight, model.clip.visual.transformer.resblocks.10.mlp.c_proj.bias, model.clip.visual.transformer.resblocks.10.ln_2.weight, model.clip.visual.transformer.resblocks.10.ln_2.bias, model.clip.visual.transformer.resblocks.11.attn.in_proj_weight, model.clip.visual.transformer.resblocks.11.attn.in_proj_bias, model.clip.visual.transformer.resblocks.11.attn.out_proj.weight, model.clip.visual.transformer.resblocks.11.attn.out_proj.bias, model.clip.visual.transformer.resblocks.11.ln_1.weight, model.clip.visual.transformer.resblocks.11.ln_1.bias, model.clip.visual.transformer.resblocks.11.mlp.c_fc.weight, model.clip.visual.transformer.resblocks.11.mlp.c_fc.bias, model.clip.visual.transformer.resblocks.11.mlp.c_proj.weight, model.clip.visual.transformer.resblocks.11.mlp.c_proj.bias, model.clip.visual.transformer.resblocks.11.ln_2.weight, model.clip.visual.transformer.resblocks.11.ln_2.bias, model.clip.visual.ln_post.weight, model.clip.visual.ln_post.bias, model.clip.transformer.resblocks.0.attn.in_proj_weight, model.clip.transformer.resblocks.0.attn.in_proj_bias, model.clip.transformer.resblocks.0.attn.out_proj.weight, model.clip.transformer.resblocks.0.attn.out_proj.bias, model.clip.transformer.resblocks.0.ln_1.weight, model.clip.transformer.resblocks.0.ln_1.bias, model.clip.transformer.resblocks.0.mlp.c_fc.weight, model.clip.transformer.resblocks.0.mlp.c_fc.bias, model.clip.transformer.resblocks.0.mlp.c_proj.weight, model.clip.transformer.resblocks.0.mlp.c_proj.bias, model.clip.transformer.resblocks.0.ln_2.weight, model.clip.transformer.resblocks.0.ln_2.bias, model.clip.transformer.resblocks.1.attn.in_proj_weight, model.clip.transformer.resblocks.1.attn.in_proj_bias, model.clip.transformer.resblocks.1.attn.out_proj.weight, model.clip.transformer.resblocks.1.attn.out_proj.bias, model.clip.transformer.resblocks.1.ln_1.weight, model.clip.transformer.resblocks.1.ln_1.bias, model.clip.transformer.resblocks.1.mlp.c_fc.weight, model.clip.transformer.resblocks.1.mlp.c_fc.bias, model.clip.transformer.resblocks.1.mlp.c_proj.weight, model.clip.transformer.resblocks.1.mlp.c_proj.bias, model.clip.transformer.resblocks.1.ln_2.weight, model.clip.transformer.resblocks.1.ln_2.bias, model.clip.transformer.resblocks.2.attn.in_proj_weight, model.clip.transformer.resblocks.2.attn.in_proj_bias, model.clip.transformer.resblocks.2.attn.out_proj.weight, model.clip.transformer.resblocks.2.attn.out_proj.bias, model.clip.transformer.resblocks.2.ln_1.weight, model.clip.transformer.resblocks.2.ln_1.bias, model.clip.transformer.resblocks.2.mlp.c_fc.weight, model.clip.transformer.resblocks.2.mlp.c_fc.bias, model.clip.transformer.resblocks.2.mlp.c_proj.weight, model.clip.transformer.resblocks.2.mlp.c_proj.bias, model.clip.transformer.resblocks.2.ln_2.weight, model.clip.transformer.resblocks.2.ln_2.bias, model.clip.transformer.resblocks.3.attn.in_proj_weight, model.clip.transformer.resblocks.3.attn.in_proj_bias, model.clip.transformer.resblocks.3.attn.out_proj.weight, model.clip.transformer.resblocks.3.attn.out_proj.bias, model.clip.transformer.resblocks.3.ln_1.weight, model.clip.transformer.resblocks.3.ln_1.bias, model.clip.transformer.resblocks.3.mlp.c_fc.weight, model.clip.transformer.resblocks.3.mlp.c_fc.bias, model.clip.transformer.resblocks.3.mlp.c_proj.weight, model.clip.transformer.resblocks.3.mlp.c_proj.bias, model.clip.transformer.resblocks.3.ln_2.weight, model.clip.transformer.resblocks.3.ln_2.bias, model.clip.transformer.resblocks.4.attn.in_proj_weight, model.clip.transformer.resblocks.4.attn.in_proj_bias, model.clip.transformer.resblocks.4.attn.out_proj.weight, model.clip.transformer.resblocks.4.attn.out_proj.bias, model.clip.transformer.resblocks.4.ln_1.weight, model.clip.transformer.resblocks.4.ln_1.bias, model.clip.transformer.resblocks.4.mlp.c_fc.weight, model.clip.transformer.resblocks.4.mlp.c_fc.bias, model.clip.transformer.resblocks.4.mlp.c_proj.weight, model.clip.transformer.resblocks.4.mlp.c_proj.bias, model.clip.transformer.resblocks.4.ln_2.weight, model.clip.transformer.resblocks.4.ln_2.bias, model.clip.transformer.resblocks.5.attn.in_proj_weight, model.clip.transformer.resblocks.5.attn.in_proj_bias, model.clip.transformer.resblocks.5.attn.out_proj.weight, model.clip.transformer.resblocks.5.attn.out_proj.bias, model.clip.transformer.resblocks.5.ln_1.weight, model.clip.transformer.resblocks.5.ln_1.bias, model.clip.transformer.resblocks.5.mlp.c_fc.weight, model.clip.transformer.resblocks.5.mlp.c_fc.bias, model.clip.transformer.resblocks.5.mlp.c_proj.weight, model.clip.transformer.resblocks.5.mlp.c_proj.bias, model.clip.transformer.resblocks.5.ln_2.weight, model.clip.transformer.resblocks.5.ln_2.bias, model.clip.transformer.resblocks.6.attn.in_proj_weight, model.clip.transformer.resblocks.6.attn.in_proj_bias, model.clip.transformer.resblocks.6.attn.out_proj.weight, model.clip.transformer.resblocks.6.attn.out_proj.bias, model.clip.transformer.resblocks.6.ln_1.weight, model.clip.transformer.resblocks.6.ln_1.bias, model.clip.transformer.resblocks.6.mlp.c_fc.weight, model.clip.transformer.resblocks.6.mlp.c_fc.bias, model.clip.transformer.resblocks.6.mlp.c_proj.weight, model.clip.transformer.resblocks.6.mlp.c_proj.bias, model.clip.transformer.resblocks.6.ln_2.weight, model.clip.transformer.resblocks.6.ln_2.bias, model.clip.transformer.resblocks.7.attn.in_proj_weight, model.clip.transformer.resblocks.7.attn.in_proj_bias, model.clip.transformer.resblocks.7.attn.out_proj.weight, model.clip.transformer.resblocks.7.attn.out_proj.bias, model.clip.transformer.resblocks.7.ln_1.weight, model.clip.transformer.resblocks.7.ln_1.bias, model.clip.transformer.resblocks.7.mlp.c_fc.weight, model.clip.transformer.resblocks.7.mlp.c_fc.bias, model.clip.transformer.resblocks.7.mlp.c_proj.weight, model.clip.transformer.resblocks.7.mlp.c_proj.bias, model.clip.transformer.resblocks.7.ln_2.weight, model.clip.transformer.resblocks.7.ln_2.bias, model.clip.transformer.resblocks.8.attn.in_proj_weight, model.clip.transformer.resblocks.8.attn.in_proj_bias, model.clip.transformer.resblocks.8.attn.out_proj.weight, model.clip.transformer.resblocks.8.attn.out_proj.bias, model.clip.transformer.resblocks.8.ln_1.weight, model.clip.transformer.resblocks.8.ln_1.bias, model.clip.transformer.resblocks.8.mlp.c_fc.weight, model.clip.transformer.resblocks.8.mlp.c_fc.bias, model.clip.transformer.resblocks.8.mlp.c_proj.weight, model.clip.transformer.resblocks.8.mlp.c_proj.bias, model.clip.transformer.resblocks.8.ln_2.weight, model.clip.transformer.resblocks.8.ln_2.bias, model.clip.transformer.resblocks.9.attn.in_proj_weight, model.clip.transformer.resblocks.9.attn.in_proj_bias, model.clip.transformer.resblocks.9.attn.out_proj.weight, model.clip.transformer.resblocks.9.attn.out_proj.bias, model.clip.transformer.resblocks.9.ln_1.weight, model.clip.transformer.resblocks.9.ln_1.bias, model.clip.transformer.resblocks.9.mlp.c_fc.weight, model.clip.transformer.resblocks.9.mlp.c_fc.bias, model.clip.transformer.resblocks.9.mlp.c_proj.weight, model.clip.transformer.resblocks.9.mlp.c_proj.bias, model.clip.transformer.resblocks.9.ln_2.weight, model.clip.transformer.resblocks.9.ln_2.bias, model.clip.transformer.resblocks.10.attn.in_proj_weight, model.clip.transformer.resblocks.10.attn.in_proj_bias, model.clip.transformer.resblocks.10.attn.out_proj.weight, model.clip.transformer.resblocks.10.attn.out_proj.bias, model.clip.transformer.resblocks.10.ln_1.weight, model.clip.transformer.resblocks.10.ln_1.bias, model.clip.transformer.resblocks.10.mlp.c_fc.weight, model.clip.transformer.resblocks.10.mlp.c_fc.bias, model.clip.transformer.resblocks.10.mlp.c_proj.weight, model.clip.transformer.resblocks.10.mlp.c_proj.bias, model.clip.transformer.resblocks.10.ln_2.weight, model.clip.transformer.resblocks.10.ln_2.bias, model.clip.transformer.resblocks.11.attn.in_proj_weight, model.clip.transformer.resblocks.11.attn.in_proj_bias, model.clip.transformer.resblocks.11.attn.out_proj.weight, model.clip.transformer.resblocks.11.attn.out_proj.bias, model.clip.transformer.resblocks.11.ln_1.weight, model.clip.transformer.resblocks.11.ln_1.bias, model.clip.transformer.resblocks.11.mlp.c_fc.weight, model.clip.transformer.resblocks.11.mlp.c_fc.bias, model.clip.transformer.resblocks.11.mlp.c_proj.weight, model.clip.transformer.resblocks.11.mlp.c_proj.bias, model.clip.transformer.resblocks.11.ln_2.weight, model.clip.transformer.resblocks.11.ln_2.bias, model.clip.token_embedding.weight, model.clip.ln_final.weight, model.clip.ln_final.bias

I also had to comment out the following lines to get the command to run, otherwise I would get plotting errors.

However, the output seems to be a blank gif with a text caption. test test

You can ignore this warning. In the current implementation, we don't store the frozen CLIP weights during training to reduce the size of checkpoint file. Therefore, during loading state_dict, pytorch can not find the corresponding weights. Instead, we will load this part of weights during initialization.

May I know your detailed log of the plotting error? It seems that you can not remove these two lines.

mingyuan-zhang commented 5 months ago

Hi @connorzl ! I faced the same problem when testing the model. Have you solved the problem? Many thanks.

You can ignore this warning. In the current implementation, we don't store the frozen CLIP weights during training to reduce the size of checkpoint file. Therefore, during loading state_dict, pytorch can not find the corresponding weights. Instead, we will load this part of weights during initialization.

Run542968 commented 4 months ago

Hi @connorzl ! I faced the same problem when testing the model. Have you solved the problem? Many thanks.

Hi, this issue come from the version of matplotlib, you can 'pip install matplotlib==3.4.3'. I have solved by this way.