soumik-kanad / diff2lip

Other
322 stars 38 forks source link

the model is not matched,i do not konw why #21

Closed chencong-source closed 7 months ago

chencong-source commented 8 months ago

parameters:Namespace(generate_from_filelist=False, video_path='/home/stardust/download/sdk/deepleaning/video-retalking/examples/face/1.mp4', audio_path='/home/stardust/download/sdk/deepleaning/video-retalking/examples/audio/1.wav', out_path='zzz.mp4', save_orig=True, test_video_dir='test_videos', filelist='test_filelist.txt', use_fp16=False, face_hide_percentage=0.5, use_ref=False, use_audio=False, audio_as_style=False, audio_as_style_encoder_mlp=False, nframes=1, nrefer=0, image_size=64, syncnet_T=5, syncnet_mel_step_size=16, audio_frames_per_video=16, audio_dim=80, is_voxceleb2=True, video_fps=25, sample_rate=16000, mel_steps_per_sec=80.0, clip_denoised=True, sampling_batch_size=2, use_ddim=False, model_path='checkpoints/e7.15_model210000_notUsedInPaper.pt', sample_path='d2l_gen', sample_partition='', sampling_seed=None, sampling_use_gt_for_ref=False, sampling_ref_type='gt', sampling_input_type='gt', face_det_batch_size=64, pads='0,0,0,0', num_channels=128, num_res_blocks=2, num_heads=4, num_heads_upsample=-1, num_head_channels=-1, attention_resolutions='16,8', dropout=0.0, class_cond=False, use_checkpoint=False, use_scale_shift_norm=True, resblock_updown=False, learn_sigma=False, diffusion_steps=1000, noise_schedule='linear', timestep_respacing='', use_kl=False, predict_xstart=False, rescale_timesteps=False, rescale_learned_sigmas=False, loss_variation=0, audio_encoder_kwargs={})

error:File "/home/stardust/download/sdk/deepleaning/diff2lip/generate.py", line 326, in main model.load_state_dict( File "/home/stardust/anaconda3/envs/facial/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2153, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for TFGModel: Missing key(s) in state_dict: "input_blocks.3.0.op.weight", "input_blocks.3.0.op.bias", "input_blocks.4.0.skip_connection.weight", "input_blocks.4.0.skip_connection.bias", "input_blocks.6.0.op.weight", "input_blocks.6.0.op.bias", "input_blocks.9.0.op.weight", "input_blocks.9.0.op.bias", "output_blocks.2.2.conv.weight", "output_blocks.2.2.conv.bias", "output_blocks.5.2.conv.weight", "output_blocks.5.2.conv.bias", "output_blocks.8.1.conv.weight", "output_blocks.8.1.conv.bias". Unexpected key(s) in state_dict: "audio_encoder.time_embed.0.weight", "audio_encoder.time_embed.0.bias", "audio_encoder.time_embed.2.weight", "audio_encoder.time_embed.2.bias", "audio_encoder.input_block.0.weight", "audio_encoder.input_block.0.bias", "audio_encoder.input_block.1.weight", "audio_encoder.input_block.1.bias"........

soumik-kanad commented 8 months ago

Can you double check that you used the correct flags?