Open KylinMountain opened 3 weeks ago
The general steps might be taken: 1) get text from the audio in the english video, using such tool like whisper or funasr; 2) translate the text into Chinese text; 3) generate Chinese audio from the translated chinese text in 2), using tts tool like MaskGCT 4) resync the audio with the original video.
@synthere if try like this, we can't copy the accent of the orginal audio and control the tts speed as the original one.
The accent could be cloned using the voice cloning function, and the tts speed can be adjusted also. Actually, I just created a video dubbing tool the other day, which u may have a try here syntheredub
I also tried the maskgct, which can control the target duration. But the resulted audio is not exactly aligned with the original as shown below(Top is the original audio, the bottom generated).
So precise alignment and resynchronization are sometimes necessary.
Problem Overview
I have a video speaking english, and I want it to say Chinese in the same speed, keep synchronize between video and audio.
How to that? Is there any instruction? Thank you.
Steps Taken
(Detail your attempts to resolve the issue, including any relevant steps or processes.)
Expected Outcome
(A clear and concise description of what you expected to happen.)
Screenshots
(If applicable, add screenshots to help explain your problem.)
Environment Information
Additional context
(Add any other context about the problem here.)