Open cantonalex opened 6 months ago
on a 4 second video?
It is normal!
Now inference does take a long time. You can try using smaller sampling steps with num_inference_steps
, which will reduce the time linearly. We recommend using 20-30 steps.
Thanks one other question @tiankuan93 to add new audio/lips to existing video is this the correct args
python scripts/extract_kps_sequence_and_audio.py \ --video_path "./destinationVideo.mp4 \ --kps_sequence_save_path "./kpsOfDestinationVideo.pth" \ --audio_save_path "./audioToReplaceLips.mp3"
like the first video on this comment https://github.com/tencent-ailab/V-Express/issues/6#issue-2320395941
Thanks one other question @tiankuan93 to add new audio/lips to existing video is this the correct args
python scripts/extract_kps_sequence_and_audio.py --video_path "./destinationVideo.mp4 --kps_sequence_save_path "./kpsOfDestinationVideo.pth" --audio_save_path "./audioToReplaceLips.mp3"
like the first video on this comment #6 (comment)
You're right. Then you can use .mp3 and .pth as inputs for video generation. Note that you need to use the --retarget_strategy "naive_retarget"
parameter when generating. If the result is not satisfactory, you need to consider choosing a video that is closer to the reference image pose as the target video.
is a reference image always required? you can't just take a video and say apply audio to this video?
is a reference image always required? you can't just take a video and say apply audio to this video?
that would be great. the readme is great just sometimes a bit confusing.
i'm just a bit confused how @zhanghongyong123456 did this first video? https://github.com/tencent-ailab/V-Express/issues/6#issue-2320395941
im assuming he used a reference image from the same target video but he mentions no retarget strategy
EDIT: I think i confused that was his source video :( I thought he got an amazing result lol
I didn't quite get what you meant. If you want an image to talk, but only have an audio(.mp3), then you can use the following script.
python inference.py \
--reference_image_path "./test_samples/A.jpg" \
--audio_path "./test_samples/aud.mp3" \
--output_path "./output/short_case/A_fix_face_with_aud.mp4" \
--retarget_strategy "fix_face" \
--reference_attention_weight 0.95 \
--audio_attention_weight 3.0
no I purely want to make an existing video talk with new audio without the crop in the export.
There doesn't seem to be a natural lip replacement strategy for an existing video.
is a reference image always required? you can't just take a video and say apply audio to this video?
maybe you need wav2lip?
wav2lip quality is lessor than that of this project
@cantonalex I couldn't get what you mean in this issue either. You want to provide the video (your source video). Normally, extract kps sequence script does this:
Now, the source video is only for extracting the kps sequence and get the audio from it (for practical reasons). You give the reference image and it talks as in the video (hopefully).
What you want is:
Otherwise, extract kps sequence script can be modified to select the best representing reference frame (according to your criteria for quality).
@faraday simply put. I only want to change the lips on a video. So input and output video are the same the only difference is the lip movements.
on a 4 second video?