Closed RuntimeRacer closed 1 year ago
@RuntimeRacer Very nice work, can you verify it on mutl-gpus?
@RuntimeRacer Very nice work, can you verify it on mutl-gpus?
@lifeiteng Yes I will do a Multi-GPU run once I finished training current epoch on Single GPU. I had to fix some issues around https://github.com/lifeiteng/vall-e/pull/113 and https://github.com/lifeiteng/vall-e/issues/110, which caused epoch 1 to never finish until I stripped languages with non-latin charsets from my training data.
I guess that epoch 1 will finish tonight or tomorrow; then I will test the accelerator code.
@lifeiteng I did a first couple of test and tries the last 2 hours, however I am hitting kind of a wall when it comes to splitting the dataloaders across the GPUs. Accelerate makes assumptions like fixed batch sizes and known amount of elements in the dataset in the preparation step; however Lhotse uses it's custom Implementations to feed in data dynamically. I can't continue looking into this rn; but let me know in case you have any suggestions what we could do here.
I was able to fix the existing DDP implementation: https://github.com/lifeiteng/vall-e/pull/116 Not sure if Accelerator is still feasible since it seems like a lot more work.
Replaces DDP Implementation with Huggingface Accelerator to allow for simpler Multi-GPU Handling (https://huggingface.co/docs/accelerate/index)