lucidrains / e2-tts-pytorch

Implementation of E2-TTS, "Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS", in Pytorch
MIT License
228 stars 21 forks source link

training script: average loss stays around 2.0 #9

Closed eschmidbauer closed 1 month ago

eschmidbauer commented 1 month ago

i've started to run the training with total epochs set to 10000. The loss seems to not decrease below 2 after around 40k steps. See the loss graph attached.

image
skirdey commented 1 month ago

How deep is the E2TTS network in your configuration? There might be difference in loss between depth=4 and depth=16, for example

eschmidbauer commented 1 month ago

I'm using the default example shown in train_example.py - which appears to be depth = 4

Should I stop training and try 16?

Ryu1845 commented 1 month ago

This is the configuration in the paper: image

lucidrains commented 1 month ago

was just shared some successful results, so i think this issue can be closed

eschmidbauer commented 1 month ago

thanks! i started retraining with latest code this morning

lucasnewman commented 1 month ago

@eschmidbauer The new projection layer should help you size the transformer more appropriately, but report back with results if you can!

I did a few epochs of 100+ hours of LibriTTS-R on a single H100 with a ~100M param model and the loss converged much lower than you reported — it's dataset / training recipe dependent, but I saw ~0.5 loss before I stopped it and visually the generated mel spectrograms looked reasonably accurate.

juliakorovsky commented 1 month ago

@eschmidbauer The new projection layer should help you size the transformer more appropriately, but report back with results if you can!

I did a few epochs of 100+ hours of LibriTTS-R on a single H100 with a ~100M param model and the loss converged much lower than you reported — it's dataset / training recipe dependent, but I saw ~0.5 loss before I stopped it and visually the generated mel spectrograms looked reasonably accurate.

Could you share number of heads and layers you used?

eschmidbauer commented 1 month ago

this is what im using to train


duration_predictor = DurationPredictor(
    transformer = dict(
        dim = 512,
        depth = 6,
    )
)

e2tts = E2TTS(
    duration_predictor = duration_predictor,
    transformer = dict(
        dim = 512,
        depth = 12,
        skip_connect_type = 'concat'
    ),
)

optimizer = Adam(e2tts.parameters(), lr=7.5e-5)

trainer = E2Trainer(
    e2tts,
    optimizer,
    num_warmup_steps=20000,
    checkpoint_path = 'e2tts.pt',
    log_file = 'e2tts.txt'
)

not getting anything useful out of the checkpoint yet but the loss is definitely converging much better