DDP GPU0 allocate too many memory

p0p4k / vits2_pytorch

unofficial vits2-TTS implementation in pytorch

https://arxiv.org/abs/2307.16430

MIT License

477 stars 85 forks source link

DDP GPU0 allocate too many memory #36

Closed yijingshihenxiule closed 1 year ago

yijingshihenxiule commented 1 year ago

Hello, Thank you for your awesome work. When I train with two gpus in recent two days, I find too many process in GPU0? How can I deal with it? .

p0p4k commented 1 year ago

I am not sure why this is the case. If many people face this issue, I can try to look into it. Try to see what happens when using just 1 gpu (CUDA_VISIBLE_DEVICES=1 train.py ...)

yijingshihenxiule commented 1 year ago

Thank you for your answer. Maybe I find thereson. In https://pytorch.org/tutorials/intermediate/ddp_tutorial.html#save-and-load-checkpoints, it says

If map_location is missing, torch.load will first load the module to CPU and then copy each parameter to where it was saved, which would result in all processes on the same machine using the same set of devices.

But I tried to load checkpoint in cpu like vits1 and load checkpoint in GPU, both did not work for me. Need Help!

yijingshihenxiule commented 1 year ago

Thank you for your answer. Maybe I find thereson. In https://pytorch.org/tutorials/intermediate/ddp_tutorial.html#save-and-load-checkpoints, it says

If map_location is missing, torch.load will first load the module to CPU and then copy each parameter to where it was saved, which would result in all processes on the same machine using the same set of devices.

But I tried to load checkpoint in cpu like vits1 and load checkpoint in GPU, both did not work for me. Need Help!

Maybe the case has nothing to do with checkpoint load. I am confused about it.

p0p4k commented 1 year ago

I'll keep this issue open. We can fix it later if it is causing any performance issues.

yijingshihenxiule commented 1 year ago

Thank you for reply. I don't know whether it will cause any performance issues. But it will cause OOM even if I set num_workers to 0. I do not solved it so far.

yijingshihenxiule commented 1 year ago

I solved it. Btw, I noticed in MonoTransformerFlowLayer() in model.py

x0, x1 = torch.split(x, [self.half_channels] * 2, 1)
x0_ = x0 * x_mask
x0_ = self.pre_transformer(x0, x_mask)  # vits2

the input of self.pre_transformer is not x0_ but x0. Could you please clarify? Thank you

p0p4k commented 1 year ago

It is just a typo. The mask is being used in the transformers either way. Fixed in latest patch. Thanks for the catch. What was your solution to the many process problem?

yijingshihenxiule commented 1 year ago

It was just my fault. I used wrong monotonic. Now it fixed. You can close this issue.