netease-youdao / EmotiVoice

EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
Apache License 2.0
6.63k stars 556 forks source link

训练出来,完全不可用 #72

Open tangflash opened 6 months ago

tangflash commented 6 months ago

训练出来,完全不可用。像哑巴一样,呀呀........,是哈原因差这么远

syq163 commented 6 months ago

Congratulations on your prompt actions regarding voice cloning. It would be helpful if you could provide more details, such as the data you are using, the number of training steps you have completed, and so on.

tangflash commented 6 months ago

1.我按这个步骤中所说的数据进行训练的:https://github.com/netease-youdao/EmotiVoice/tree/main/data/DataBaker。 其中这两项没执行。 image 2.此外我改为本地模型训练,修改成以下代码: train_dataset = Dataset_PromptTTS_JETS(config.train_data_path, config, style_encoder)

data_sampler = DistributedSampler(train_dataset)

#train_loader = torch.utils.data.DataLoader(
#    train_dataset,
#    num_workers=8,
#    shuffle=False,
#    batch_size=config.batch_size,
#    collate_fn=train_dataset.TextMelCollate,
#    sampler = data_sampler,
#)
# 不再使用 DistributedSampler
train_loader = torch.utils.data.DataLoader(
    train_dataset,
    #num_workers=8,
    num_workers=1,
    shuffle=True,  # 使用shuffle而不是sampler
    batch_size=config.batch_size,
    collate_fn=train_dataset.TextMelCollate
)

3.训练了5000步和10000步是一样的效果,没有正常的声音 4.4000步val val_4000_melspec_0 val_4000_melspec_1

syq163 commented 6 months ago

1.我按这个步骤中所说的数据进行训练的:https://github.com/netease-youdao/EmotiVoice/tree/main/data/DataBaker。 其中这两项没执行。 image 2.此外我改为本地模型训练,修改成以下代码: train_dataset = Dataset_PromptTTS_JETS(config.train_data_path, config, style_encoder) #data_sampler = DistributedSampler(train_dataset) #train_loader = torch.utils.data.DataLoader( # train_dataset, # num_workers=8, # shuffle=False, # batch_size=config.batch_size, # collate_fn=train_dataset.TextMelCollate, # sampler = data_sampler, #) # 不再使用 DistributedSampler train_loader = torch.utils.data.DataLoader( train_dataset, #num_workers=8, num_workers=1, shuffle=True, # 使用shuffle而不是sampler batch_size=config.batch_size, collate_fn=train_dataset.TextMelCollate )

3.训练了5000步和10000步是一样的效果,没有正常的声音 4.4000步val val_4000_melspec_0 val_4000_melspec_1

For DataBaker, it should work fine. I have attached the results from 5000 steps for your reference. DataBaker-g_00005000.zip And I will attempt to replicate the issue based on the modifications you have made.

tangflash commented 6 months ago

谢谢你的及时回复。我想我要重装下python环境试下,你用哪个版本的pytorch和cuda?

syq163 commented 6 months ago

Some setups for your reference: cuda 11.8 torch 2.1.1 python 3.10 cuda 11.7 torch 1.13 python 3.8

tangflash commented 6 months ago

我安装了cuda 11.8 torch 2.1.1 python 3.10,重装环境也不行,我的是windows11,无法用分布式训练。用预训练模型都比训练的好,至少可以出来声音。:(

ttPrivacy commented 6 months ago

想请问一下你的硬件配置是?10000步大概训练了多久?我的训练了快10个小时了还没出结果。

tangflash commented 6 months ago

3060 12G的显卡,32G内存,10000步大概2个小时。其实5000步就可推理了。

---原始邮件--- 发件人: @.> 发送时间: 2023年12月17日(周日) 凌晨5:31 收件人: @.>; 抄送: @.**@.>; 主题: Re: [netease-youdao/EmotiVoice] 训练出来,完全不可用 (Issue #72)

想请问一下你的硬件配置是?10000步大概训练了多久?我的训练了快10个小时了还没出结果。

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

set-path commented 5 months ago

训练出来,完全不可用。像哑巴一样,呀呀........,是哈原因差这么远

请问您解决了吗?我遇到了一样的问题,使用https://github.com/netease-youdao/EmotiVoice/tree/main/data/LJspeech提供的步骤,没有执行MFA,微调结果基本没有声音。

tangflash commented 5 months ago

使用wsI弄个linux

---原始邮件--- 发件人: "Wenbin @.> 发送时间: 2024年1月13日(周六) 晚上10:29 收件人: @.>; 抄送: @.**@.>; 主题: Re: [netease-youdao/EmotiVoice] 训练出来,完全不可用 (Issue #72)

训练出来,完全不可用。像哑巴一样,呀呀........,是哈原因差这么远

请问您解决了吗?我遇到了一样的问题,使用https://github.com/netease-youdao/EmotiVoice/tree/main/data/LJspeech提供的步骤,没有执行MFA,微调结果基本没有声音。

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>