Closed shreyasinghal-17 closed 1 year ago
More details about individual losses can be seen in sub_loss
card. The flow matching loss is noisy and is not the best representation of model performance (similar to diffusion loss). Further, from what I've seen it is okay for the duration predictor to overfit for a while mainly because of the limited val set (my val set was only 100 samples).
Generally, training for longer helps, as these losses are not the best representative of the model's performance, similar to diffusion-type models.
I am getting good quality speech! I havent trained a vocoder yet, but with griffin lim it seems acceptable, I'm sure neural vocoders would yield great output.
Since I am new with this I have some questions, Should I ask them here or is there a better channel for discussion?
I am getting good quality speech! I havent trained a vocoder yet, but with griffin lim it seems acceptable.
🍵🍵 This is good to hear. 🍵🍵
I'm sure neural vocoders would yield great output.
If you do not want to train a vocoder you can use the universal HiFiGAN.
Since I am new to this I have some questions, Should I ask them here or is there a better channel for discussion?
You can continue posting them here, or reach out to me by email smehta
(at)kth
(dot)se
. Additionally, I can also answer them on LinkedIn.
I tried using the universal vocoder but that dint went well, I'll figure it out tho.
Thanks I'll close this issue then and reach out on other channels.
@shreyasinghal-17 Hi, can you share your dur_loss graphs/logs? Thanks
@p0p4k I trained this for more iterations but for some reason tensorboard is not showing the comeplete data points
@shreyasinghal-17 Hi, can you share your diff_loss&prior graphs/logs? Thanks
Attaching my loss card from tensorboard/ does everything looks okay? val_epoch seems to have converged, what were the final losses you were getting. Any other thing I need to keep an eye on?
with smoothing = 0.6