lifeiteng / vall-e

PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html
https://lifeiteng.github.io/valle/index.html
Apache License 2.0
1.99k stars 320 forks source link

Error Training on Commonvoice Spanish #165

Closed hulsmeier closed 12 months ago

hulsmeier commented 1 year ago

python3.10 bin/trainer.py --max-duration 40 --filter-min-duration 0.5 --filter-max-duration 14 --train-stage 1 --num-buckets 6 --dtype bfloat16 --save-every-n 2500 --valid-interval 2500 --model-name valle --share-embedding true --norm-first true --add-prenet false --decoder-dim 1024 --nhead 16 --num-decoder-layers 12 --prefix-mode 1 --base-lr 0.05 --warmup-steps 200 --average-period 0 --num-epochs 70 --start-epoch 1 --start-batch 0 --accumulate-grad-steps 4 --keep-last-k 40 --exp-dir exp/valle --manifest-dir data/tokenized --text-tokens data/tokenized/unique_text_tokens.k2symbols --oom-check false --dataset commonvoice --world-size 1

2023-09-07 20:44:33,335 INFO [trainer.py:1092] Saving batch to exp/valle/batch-bdd640fb-0667-1ad1-1c80-317fa3b1799d.pt
Traceback (most recent call last):
  File "/home/ubuntu/vall-e/egs/commonvoice/bin/trainer.py", line 1161, in <module>
    main()
  File "/home/ubuntu/vall-e/egs/commonvoice/bin/trainer.py", line 1154, in main
    run(rank=0, world_size=1, args=args)
  File "/home/ubuntu/vall-e/egs/commonvoice/bin/trainer.py", line 1043, in run
    train_one_epoch(
  File "/home/ubuntu/vall-e/egs/commonvoice/bin/trainer.py", line 660, in train_one_epoch
    _, loss, loss_info = compute_loss(
  File "/home/ubuntu/vall-e/egs/commonvoice/bin/trainer.py", line 525, in compute_loss
    predicts, loss, metrics = model(
  File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/ubuntu/vall-e/valle/models/valle.py", line 813, in forward
    y, targets = self.pad_y_eos(
  File "/home/ubuntu/vall-e/valle/models/valle.py", line 325, in pad_y_eos
    targets = F.pad(y, (0, 1), value=0) + eos_id * F.pad(
RuntimeError: The size of tensor a (716) must match the size of tensor b (2) at non-singleton dimension 1

I'm only using the spanish dataset. Running on a single A10 gpu.

I'm using this PR https://github.com/lifeiteng/vall-e/pull/111 from @RuntimeRacer