Jump in sub_loss/train_dur_loss_step

p0p4k / pflowtts_pytorch

Unofficial implementation of NVIDIA P-Flow TTS paper

https://neurips.cc/virtual/2023/poster/69899

MIT License

214 stars 30 forks source link

Jump in sub_loss/train_dur_loss_step #11

Open vn09 opened 10 months ago

vn09 commented 10 months ago

Hi @p0p4k ,

I hope this message finds you well. I am currently working on training the pflowtts model with my own dataset and have encountered an unexpected behavior that I'm hoping to get some assistance with.

During training, I've observed significant jumps in the sub_loss/train_dur_loss_step metric, as illustrated in the screenshot below:

Screenshot 2023-12-04 at 17 54 38

I have followed the recommended setup and training guidelines, but I am unsure what might be causing these fluctuations. Here are some details about my training configuration and dataset:

   batch_size: 64
   n_spks: 1
   ...
  data_statistics:
    mel_mean: -6.489412784576416
    mel_std: 2.281172275543213

I would greatly appreciate it if you could provide any insights or suggestions that might help resolve this issue. Perhaps there are known factors that could lead to such behavior or additional steps I could take to stabilize the training loss?

p0p4k commented 10 months ago

Could be some anomaly in the dataset. Don't worry about it, let the model train and check inference, that is when you start debugging

rafaelvalle commented 10 months ago

To find problematic samples, one can generate transcriptions with whisper v3 and compare them with the transcriptions in the data by looking for samples with high edit distance for example.

p0p4k commented 10 months ago

@rafaelvalle [unrelated question] given any sliced prompt from the target mel, since they all are supposed to give out the same target mel; is there a way to add some loss for this in one forward pass while using multiple slices for the same target mel? Thanks.

vuong-ts commented 10 months ago

Thanks @rafaelvalle @p0p4k . I use the trained model to run again on training data to filter out outlier samples (dur_loss > 5) and it helps. The loss of new train is smooth now.

rafaelvalle commented 10 months ago

By looking at the text and audio of the samples with dur_loss larger than 5, can you determine the reason for such high loss? Usual suspects include incorrect transcription, long pauses, etc...

On Fri, Dec 8, 2023, 08:02 vuong-ts @.***> wrote:

Thanks @rafaelvalle https://github.com/rafaelvalle @p0p4k https://github.com/p0p4k . I use the trained model to run again on training data to filter out outlier samples (dur_loss > 5) and it helps. The loss of new train is smooth now.

— Reply to this email directly, view it on GitHub https://github.com/p0p4k/pflowtts_pytorch/issues/11#issuecomment-1846461289, or unsubscribe https://github.com/notifications/unsubscribe-auth/AARSFD7TILEEXJFR6AVCGITYIJ35VAVCNFSM6AAAAABAFYZI3KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNBWGQ3DCMRYHE . You are receiving this because you were mentioned.Message ID: @.***>

vn09 commented 10 months ago

The issues mostly are:

incorrect transcription like wrong text inserted at the beginning
long pause at the end.