lucidrains / audiolm-pytorch

Implementation of AudioLM, a SOTA Language Modeling Approach to Audio Generation out of Google Research, in Pytorch
MIT License
2.36k stars 255 forks source link

can soundstream be trained with fp16? #127

Closed Liujingxiu23 closed 1 year ago

Liujingxiu23 commented 1 year ago

run: accelerate launch --multi_gpu --mixed_precision=fp16 --gpu_ids=0,1 train.py

error info Input type (CUDAComplexHalfType) and weight type (CUDAComplexFloatType) should be the same

relative code: def forward(self, x): weight, bias = map(torch.view_as_complex, (self.weight, self.bias)) return F.conv2d(x, weight, bias, stride = self.stride, padding = self.padding)

lucidrains commented 1 year ago

@Liujingxiu23 this complex valued discriminator is going to be the end of me lol

so i believe the complex network needs to be done at full precision - i think i'll just force an autocast here. will get it done later today

lucidrains commented 1 year ago

@Liujingxiu23 could you try 0.23.7 and see if that addresses the issue?

Liujingxiu23 commented 1 year ago

@lucidrains Thank you!It works. And my trainning which not use fp16 have trained several days, and the synthesized wave is just nearly to understand.

lucidrains commented 1 year ago

hmm let's move this to discussion, as the original issue has been solved

Liujingxiu23 commented 1 year ago

the loss=nan using fp16

stevenhillis commented 1 year ago

Seconding training soundstream with fp16 mixed precision as an outstanding issue on 0.25.5. I see real values for all losses except for multi_spectral_recon_loss, which is inf at step 0 and quickly moves to nan.