soumik-kanad / diff2lip

Other
322 stars 38 forks source link

about infer my own video? have error ? #4

Closed zhanghongyong123456 closed 10 months ago

zhanghongyong123456 commented 10 months ago
  1. my cmd :python generate.py --generate_from_filelist=0 --video_path=path\to*.mp4 --audio_path=path\to*.mp3 --out_path=path\to*.mp4 --attention_resolutions 32,16,8 --learn_sigma True --num_head_channels 64 --resblock_updown True --use_scale_shift_norm False --sampling_input_type=first_frame --sampling_ref_type=first_frame --timestep_respacing ddim25 --use_ddim True --sample_path=output_dir --nframes 5 --nrefer 1 --use_ref=True --use_audio=True --audio_as_style=True --save_orig=False --image_size 128

  2. about input video: image

  3. about resoult video: image

  4. What is the reason for this

soumik-kanad commented 10 months ago

It seems like a face detection problem. I'm guessing you need to also change the flag --is_voxceleb2=False.

Please make sure that the other flags are consistent with the inference script.

zhanghongyong123456 commented 10 months ago

It seems like a face detection problem. I'm guessing you need to also change the flag --is_voxceleb2=False.

Please make sure that the other flags are consistent with the inference script.

when  i  add  this  flag, --is_voxceleb2=False. i   have   this  outputs:

https://github.com/soumik-kanad/diff2lip/assets/48466610/40837195-049d-4290-85e2-4e0a4813010d

soumik-kanad commented 10 months ago

This seems better.

(If your input video (face or body) was moving, then you can also try using the flags --sampling_input_type=gt --sampling_ref_type=gt for similar movement/expression. With the current flag values you are using it only manipulates the first frame of the video. )