PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html
2.03k
stars
319
forks
source link
Could you please provide me with the specific parameter configurations in the command for training the LJSpeech dataset? #184
Open
mumuyeye opened 6 months ago
Could you please provide me with the specific parameter configurations in the command for training the LJSpeech dataset? Like this:
python3 bin/trainer.py --max-duration 80 --filter-min-duration 0.5 --filter-max-duration 14 --train-stage 1 \ --num-buckets 6 --dtype "bfloat16" --save-every-n 10000 --valid-interval 20000 \ --model-name valle --share-embedding true --norm-first true --add-prenet false \ --decoder-dim 256 --nhead 8 --num-decoder-layers 6 --prefix-mode 1 \ --base-lr 0.05 --warmup-steps 200 --average-period 0 \ --num-epochs 20 --start-epoch 1 --start-batch 0 --accumulate-grad-steps 4 \ --exp-dir exp/valle
and
python3 bin/trainer.py --max-duration 40 --filter-min-duration 0.5 --filter-max-duration 14 --train-stage 2 \ --num-buckets 6 --dtype "float32" --save-every-n 10000 --valid-interval 20000 \ --model-name valle --share-embedding true --norm-first true --add-prenet false \ --decoder-dim 256 --nhead 8 --num-decoder-layers 6 --prefix-mode 1 \ --base-lr 0.05 --warmup-steps 200 --average-period 0 \ --num-epochs 40 --start-epoch 3 --start-batch 0 --accumulate-grad-steps 4 \ --exp-dir exp/valle