[Help]: MaskGCT's results were very strange

WhiteNightMo commented 5 days ago

Problem Overview

I modified this file models/tts/maskgct/maskgct_inference.py, changes are as follows:

    # inference
    prompt_wav_path = "./models/tts/maskgct/wav/5s.wav"
    save_path = "generated_audio7.wav"
    prompt_text = "想要交友吗？快来SOUL啊"
    target_text = "新用户真的可以享年化利率最低3.6%的优惠"
    # Specify the target duration (in seconds). If target_len = None, we use a simple rule to predict the target duration.
    target_len = None
    maskgct_inference_pipeline = MaskGCT_Inference_Pipeline(
        semantic_model,
        semantic_codec,
        codec_encoder,
        codec_decoder,
        t2s_model,
        s2a_model_1layer,
        s2a_model_full,
        semantic_mean,
        semantic_std,
        device,
    )

    recovered_audio = maskgct_inference_pipeline.maskgct_inference(
        prompt_wav_path, prompt_text, target_text, "zh", "zh", target_len=target_len
    )

    sf.write(save_path, recovered_audio, 24000)

Run command:

python -m models.tts.maskgct.maskgct_inference

The output did not meet my expectations.

My original file: 5s.zip

Output file: generated_audio7.zip

HeCheng0625 commented 5 days ago

It seems like the larget len is too long, you can specify the appropriate target length yourself.

WhiteNightMo commented 5 days ago

It seems like the larget len is too long, you can specify the appropriate target length yourself.

I tried to change target_len to 8, but the output audio was missing the first 3 words and was slow overall. When I changed it to 10, the output audio read everything, but the speed was really slow. 10s.zip

decajcd commented 4 days ago

as missing the first 3 words and was slow overall. When I changed it to 10, the output audio read everything, but the speed was really slow.

请问解决了吗

WhiteNightMo commented 4 days ago

as missing the first 3 words and was slow overall. When I changed it to 10, the output audio read everything, but the speed was really slow.

请问解决了吗

没呢，倒腾不出来

decajcd commented 4 days ago

as missing the first 3 words and was slow overall. When I changed it to 10, the output audio read everything, but the speed was really slow.

请问解决了吗

没呢，倒腾不出来

调不出来，要么太快要么胡说八道

WhiteNightMo commented 4 days ago

as missing the first 3 words and was slow overall. When I changed it to 10, the output audio read everything, but the speed was really slow.

请问解决了吗

没呢，倒腾不出来

调不出来，要么太快要么胡说八道

难顶，我是要么太慢要么胡说八道

decajcd commented 4 days ago

as missing the first 3 words and was slow overall. When I changed it to 10, the output audio read everything, but the speed was really slow.

请问解决了吗

没呢，倒腾不出来

调不出来，要么太快要么胡说八道

难顶，我是要么太慢要么胡说八道

我还有背景音

digitalboy commented 2 days ago

有人解决了吗？ Anybody fixed this?

ruby11dog commented 1 day ago

your prompt audio and prompt text are not matched completely

open-mmlab / Amphion

[Help]: MaskGCT's results were very strange #359

Problem Overview