Closed Sreyan88 closed 1 year ago
Hi, thanks for your attention.
This is a gradient overflow problem. I think the reason may be that you set a small max_token
. The default setting of max_token
is about 2800k, and we usually use 16 GPUs for training. You may need to increase your max_token
or set update_freq
to a larger number or try to use more GPUs for training.
Thank You so much for your reply.
1) I see your default max_token
in config is at 1400k, which is also the same as the original HuBERT. My one is slightly decreased to 1000k. Do you think even this slight decrease is making a difference?
2) I see you haven't set any update_freq
in your config explicitly. What, according to you, should I set that to?
3) Another change from your default setup is that I am using mfcc labels instead of hubert intermediate layer like original Speech2C. I am not sure if that makes a huge difference. However, I would like to ask which quantizer setup you trained your model on a) HuBERT trained for only 1st iteration (on mfcc) or 2) HuBERT trained for both iterations (mfcc + intermediate)
Thank You again!
Hi, the batch size is number of GPUs x max_token x update_freq
. So the default setting of Speech2C is 32 x 1400k x 1
, which is also equal to 16 x 2800k x 1
.
Back to your problem, you just use 4 GPUs with a max_token
of 1000k. So your batch size is 4 x 1000k x 1
, which is much smaller than the default setting. In your case, you may need to adjust these three values to ensure the batch size is large enough, which can alleviate the gradient overflow problem.
For the third problem, the quantizer setup or our model is "2) HuBERT trained for both iterations (mfcc + intermediate)"
Hope this can help you.
Thank You for your reply. I am also guessing the quantizer might have an effect (beyond the batch size) because your added decoder might be relying on strong quantizer cues. The original HuBERT was trained twice (2 iterations). You have a strong quantizer and train once.
I am also curious how keeping distributed_world_size
32 did not throw any error for me even though I had just 4 gpus in my system.
I will investigate these and update you here! Request you please keep this open till then!
Hello!
Thank You for the great work again! I try to train Speech2C and got this error after 49 epochs:
[2022-11-07 00:33:16,340][fairseq.nan_detector][WARNING] - Inf detected in output of , shape: torch.Size([1464, 505]), forward
Some training details: dataset: libri 360 k-means trained on: libri 100config: https://drive.google.com/file/d/1Ms5m-cuTrv43xsntHBdM_PEWaXtGGMOR/view?usp=sharing hydra_log: https://drive.google.com/file/d/1HWvXqUGhNU-LnKNRj52HAbXPR-GqOVBU/view?usp=sharing
Can you please let me know if this has happened in your training setup ever? Or if you know where I am going wrong?
Thank You!