Open dohe0342 opened 1 year ago
Is 100 epoch ar and nar model each? The code has changed now, so I was wondering :) I have reproduced the training but it seems to have a bit different performance (and mine took for about 1.5 days to train 100 epoch each! on 8 * A100 gpus!)
@sjoon2455 can you share your tensorboard?
Many of us encountered the missing keys problem when loading the pretrained model. If anyone wants to use the pretrained model provided by @dohe0342, the main trick is that you should checkout to the right commit or any commit with the same valle model, and then reinstall valle by
pip uninstall valle; pip install -e .
Since when we tried to initialize a new model, python will use the valle installed in env instead of the source code.
@dohe0342 Thanks for sharing the pretrained model which is trained for 100 epochs. When we say, 100 epochs, is it 100 each for AR and NAR or combined numbers where we start with an AR model (probably 50 epochs)? Pls clarify. I have trained a model for 100 epochs but quality isnt as good as shared by you here at the beginning.
Thanks in advance Sagar
Have you or has anyone else done further training? Also, which Libre dataset (size) was it? Thanks!
@dohe0342 I'm interested in your pre-training model. Can you share your pre-training model with me? Thank you! my email is : xlwj_sd@163.com
how
epoch-100.pt
how does epoch-100.pt works with the inferences code provided in this repo
as ar.pt
and nar.pt
are needed?
@dohe0342 I'm interested in your pre-training model. Can you share your pre-training model with me? Thank you! my email is : ajeet9698@gmail.com
@dohe0342 I'm interested in your pre-training model. Can you share your pre-training model with me? Thank you! my email is : ajeet9698@gmail.com
and will it work if i want to train it on specific set of voices let's say 10 or 15 persons set of voices
@thangnvkcn @jieen1 @LorenzoBrugioni @UncleSens @Zhang-Xiaoyi @lqj01@电子科技大学根@yiwei0730 @hackerxiaobai
抱歉回复晚了。这是我训练的模型。 谷歌驱动器链接:链接
像这样的命令推断:
python bin/infer.py --output-dir ./ --model-name valle --norm-first true --add-prenet false --decoder-dim 1024 --nhead 16 --num-decoder-layers 12 --text-prompts "KNOT one point one five miles per hour." --audio-prompts ./prompts/8463_294825_000043_000000.wav --text "To get up and running quickly just follow the steps below." --checkpoint exp/epoch-100.pt
@hardik7我分享了我的预训练模型。这样你就可以合成卡通音频了。但我使用 LibriTTS 训练了我的模型,该模型由 550 小时的人类有声读物组成。最初的 val-e 是在 librilight 上进行训练的,它有 60k 小时的音频。
因此,由于缺乏卡通训练集和数据集数量,我的预训练模型无法合成卡通音频。
Hello! I am interested in your pre-training model. The pre-training weights you posted seem to be invalid. Can you share your pre-training model with me? Thank you!
@thangnvkcn @jieen1 @LorenzoBrugioni @UncleSens @Zhang-Xiaoyi @lqj01 @UESTCgan @yiwei0730 @hackerxiaobai
Sorry for late reply. This is the model that I trained. google drive link : link
infer like this command:
python bin/infer.py --output-dir ./ --model-name valle --norm-first true --add-prenet false --decoder-dim 1024 --nhead 16 --num-decoder-layers 12 --text-prompts "KNOT one point one five miles per hour." --audio-prompts ./prompts/8463_294825_000043_000000.wav --text "To get up and running quickly just follow the steps below." --checkpoint exp/epoch-100.pt
@hardik7 I shared my pre-trained model. So you can synthesize the cartoon audio. But I trained my model using LibriTTS which is composed of 550 hours human audiobook. And original vall-e was trained on librilight which has 60k hours audio.
So, my pre-trained model has no capability to synthesize cartoon audio since lack of cartoon train set and lack of dataset amount.
Thanks for sharing, however, this google link has already expired. Could you update a new version? Thanks a lot!
@dohe0342 , could you please share the pre-trained model for VALL-E. The google link has expired. If possible, please share us the training script which you have used.
Thanks
Hi,
I trained Vall-E on the LibriTTS dataset, but my model did not converge well. I am sharing the final checkpoint checkpoint link and training curves. Feel free to provide suggestion about further improvement.
Quick example: prompt : prompt_link synthesized audio : synt_link
Dear SA:
Really appreciate that you re-uploaded your pretrained model and data. Thanks a lot!
best wishes Rafael J.
SA @.***> 于2024年7月4日周四 14:09写道:
Hi,
I trained Vall-E on the LibriTTS dataset, but my model did not converge well. I am sharing the final checkpoint checkpoint link https://drive.google.com/file/d/1DoaFjl6iJy4U2qrxVp0Z0QBPJ6lgVQQ0/view?usp=sharing and training curves. Feel free to provide suggestion about further improvement.
train.png (view on web) https://github.com/lifeiteng/vall-e/assets/48153370/d71d0235-fba7-475a-b8b4-a15ea3b3d7e6 train2.png (view on web) https://github.com/lifeiteng/vall-e/assets/48153370/bbdcc87b-42a8-4b6b-8015-87c5c4e78a64
Quick example: prompt : prompt_link https://drive.google.com/file/d/12NfYKrnTZpqj_v7ain39KVtepYNLyjRe/view?usp=sharing synthesized audio : synt_link https://drive.google.com/file/d/1NfySUeibqhA6RJDirrGmV1c_zOagg7vK/view?usp=sharing
— Reply to this email directly, view it on GitHub https://github.com/lifeiteng/vall-e/issues/58#issuecomment-2208185512, or unsubscribe https://github.com/notifications/unsubscribe-auth/A5BUODCT6ZSDFANJFU23EG3ZKTRJ3AVCNFSM6AAAAAAWAW2YZKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMBYGE4DKNJRGI . You are receiving this because you commented.Message ID: @.***>
@hdmjdp
I ran vall-e last week version which has no prefix option. And I found prefix 0 is same as vall-e last week version version.
Here is my tensorboard image. I ran 177 epochs actually but 100-epoch checkpoint was used to generate audios.
I'll soon upload tensorboard image. Please wait.
Hi, what kind of loss reduction drawn on the tb graphics? default value is reduction==sum, but loss is very small for sum reduction
Hi,
I trained Vall-E on the LibriTTS dataset, but my model did not converge well. I am sharing the final checkpoint checkpoint link and training curves. Feel free to provide suggestion about further improvement.
Quick example: prompt : prompt_link synthesized audio : synt_link
Hi, Thank you for sharing checkpoint. But, I think it is corrupted. I can't load that checkpoint you shared.
Hi, it's trained with the default parameters. I don't know why the loss dropped so low.
@hdmjdp I ran vall-e last week version which has no prefix option. And I found prefix 0 is same as vall-e last week version version. Here is my tensorboard image. I ran 177 epochs actually but 100-epoch checkpoint was used to generate audios. I'll soon upload tensorboard image. Please wait.
Hi, what kind of loss reduction drawn on the tb graphics? default value is reduction==sum, but loss is very small for sum reduction
Hi, I trained Vall-E on the LibriTTS dataset, but my model did not converge well. I am sharing the final checkpoint checkpoint link and training curves. Feel free to provide suggestion about further improvement. Quick example: prompt : prompt_link synthesized audio : synt_link
Hi, Thank you for sharing the checkpoint. But, I think it is corrupted. I can't load that checkpoint you shared. Hi, sorry for the corrupted checkpoint. I compressed it recently for space reduction and something happened. I changed the link in the original post. I hope it works.
I trained vall-e on LibriTTS about 100 epochs (took almost 4 days on 8 A100 GPUs) and I obtained plausible synthesized audio.
Here is a demo. [1] prompt : prompt_link synthesized audio : synt_link
[2] prompt : prompt_link ground truth : gt_link synthesized audio : synt_link
[3] prompt : prompt_link synthesized audio : synt_link
[4] prompt : prompt_link ground truth : gt_link synthesized audio : synt_link
The model I trained has worse quality than original vall-e because of dataset amount. However, It has a promising quality in clean audio. I'm not sure whether I can share my pre-trained LibriTTS model. If I can, I would like to share the pre-trained LibriTTS model.