rhasspy / piper

A fast, local neural text to speech system
https://rhasspy.github.io/piper-samples/
MIT License
4.57k stars 315 forks source link

About GPU utilization #121

Open trunglebka opened 11 months ago

trunglebka commented 11 months ago

This isn't an issue but a question about GPU utilization during training I'm training my voice with piper but it does not utilize GPU well. about 30% - I think

Currently I'm running/training two different experiment (not DDP) and this is my screenshot of nvtop image

I wonder if there is a method to enhance the situation, or if this is a specific pattern of the model? My machine has 210 GB free memory that has been used to cache and dataset cache size is jus 15GB (as show in htop) so I don't think that it is problem of slowly disk.

Lauler commented 10 months ago

What worked for me:

The num_workers param/arg was not configurable in the Lightning config. In my case training by default only ran with 1 CPU for dataloading. This caused the CPU to bottleneck the training.

Try manually editing num_workers arg in the functions train_dataloader and valid_dataloader in piper/src/python/lightning.py and see if that helps.

trunglebka commented 10 months ago

I've changed that number (to five workers) but from htop addition threads almost does not use CPU at all while main process keep taking full CPU. GPU usage almost the same as original code without dataloader worker

synesthesiam commented 10 months ago

Are you training 1 model across those 2 GPUs, or 2 models?

trunglebka commented 10 months ago

That is two separate models, each running on a dedicated GPU


From: Michael Hansen @.> Sent: Thursday, July 13, 2023 8:04 PM To: rhasspy/piper @.> Cc: trunglebka @.>; Author @.> Subject: Re: [rhasspy/piper] About GPU utilization (Issue #121)

Are you training 1 model across those 2 GPUs, or 2 models?

— Reply to this email directly, view it on GitHubhttps://github.com/rhasspy/piper/issues/121#issuecomment-1634208997, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD3G52VWHVILLZMZAUNXQMDXP7W5RANCNFSM6AAAAAAZ33HY2Q. You are receiving this because you authored the thread.Message ID: @.***>

Lauler commented 10 months ago

I see the VRAM used is about 9GB. What size model are you training and with what batch size? If your batch size is small, then that also tends to decrease GPU utilization.

In my case, when I train a high quality model with batch size 8, the utilization drops to around 50%. But when I train the same model with batch size 32, it averages 80% to 90% (with RTX 3090).

trunglebka commented 10 months ago

@Lauler I'm training with batch size 8, medium model. I can just run with that batch size because of few initial iterations it has some peaks and cause OOM error. Based on your info I'm suspecting that I'm simply does not have enough VRAM to utilize full of GPU

trunglebka commented 10 months ago

Well, look like my ancient CPU not powerful enough to saturate GPU. synesthesiam, can you suggest me some CPU intensive parts that can be cached to reduce CPU usage?

synesthesiam commented 10 months ago

I believe the most CPU intense part is the batch collating, where it sorts, pads, and copies all the tensors, which can't really be cached. Piper preprocessing does all the audio conversion already.

Turning off validation and testing phases may help by setting the "--num-...." flags to 0.

trunglebka commented 10 months ago

@synesthesiam I've monitored UtteranceCollate, it run pretty fast, about 0.015s for a batch of size 16. Using a cache to ensure that UtteranceCollate is not compute expensive, main training process still saturate a CPU.

synesthesiam commented 10 months ago

Besides that, the various mel calculations in the training step (plus the loss calculations) are probably making up the rest of the CPU usage. There's no getting around those calculations, though, unless you were to pre-generate every possible batch.

trunglebka commented 10 months ago

Thank you guys, maybe I should try simplest solution: jump ship 😂

trunglebka commented 10 months ago

@synesthesiam there is some weird degrading GPU utilization between start of training process and after a "long" time as you can see in the picture below. I don't know if it is the problem with piper or pytorch lightning

image

synesthesiam commented 9 months ago

Piper doesn't do any changes later in the training cycle, so I'm not sure what it could be. The GPU doesn't look like it's getting too hot. Does swapping the PCI-e slots make any difference?

trunglebka commented 9 months ago

It weird that I've trained ASR task (k2 icefall) in two months without problem and Cuda almost always full so I don't think there is problems with hardware.

Back to piper, if I stop and resume training task the GPU usage is high as starting phase.