microsoft / SpeechT5

Unified-Modal Speech-Text Pre-Training for Spoken Language Processing
MIT License
1.16k stars 113 forks source link

SpeechT5 pretrain #30

Open benyang0506 opened 1 year ago

benyang0506 commented 1 year ago

Thanks for your previous reply! But now I encounter another question, when I used fp16 to pretrain, I found an ERROR as follows: image It seems that fp16 from fairseq is not adapted to torch

mechanicalsea commented 1 year ago

Hi, benyang0506.

The training error may be caused by software environment. I met the similar error when I reimplemented the SpeechT5 pre-training on a new environment, then the problem was fixed by installing pytorch of previous version, such as 1.12 or 1.10. Also, ensure that the GPU and CUDA supports / enables FP16 computation.

Best wishes.

benyang0506 commented 1 year ago

Thanks for your reply! I have tried several torch versions, but they only support TensorFloat32 when using baddbmm. I alse used V100, which supports fp16 computation, because I used it on previous work. By the way, I wonder whether there is a big difference about the speed of pretrain between using fp16 and fp32. Thanks!

Ajyy commented 1 year ago

Hi, Sorry for the late reply. I think using fp16 may improve speed and require less memory but achieve similar performance. In addition, V100 can support fp16 training. I do not have a detailed comparison of fp16 and fp32, but it's better to use fp16 for training.

image

Here is a comparison from pytorch for your reference. Thanks.

mechanicalsea commented 1 year ago

Hi @benyang0506

We found some ways helpful to handle the problem as you mentioned above. Specifically, these attempts are summarized as follows.

  1. perform both conda and pip install pytorch when use conda (e.g., miniconda) to fairseq-train.
  2. move or link the USER_DIR in the directory of fairseq/examples and use it as USER_DIR. The issue occurs at Multi-GPU training doesn't work when --user-dir specified #4875.

When we reimplemented the SpeechT5 SID task using torch==1.10.1+cuda113, we also encounter the same questions as you, e.g., fp16 not works. Helpfully, these attempts can be useful.

JuneRen commented 1 year ago

Did you solve the problem? I also encountered this problem

@benyang0506