lifeiteng / vall-e

PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html
https://lifeiteng.github.io/valle/index.html
Apache License 2.0
2.04k stars 320 forks source link

DRAFT - Bootstrap Accelerator - Untested #115

Closed RuntimeRacer closed 1 year ago

RuntimeRacer commented 1 year ago

Replaces DDP Implementation with Huggingface Accelerator to allow for simpler Multi-GPU Handling (https://huggingface.co/docs/accelerate/index)

lifeiteng commented 1 year ago

@RuntimeRacer Very nice work, can you verify it on mutl-gpus?

RuntimeRacer commented 1 year ago

@RuntimeRacer Very nice work, can you verify it on mutl-gpus?

@lifeiteng Yes I will do a Multi-GPU run once I finished training current epoch on Single GPU. I had to fix some issues around https://github.com/lifeiteng/vall-e/pull/113 and https://github.com/lifeiteng/vall-e/issues/110, which caused epoch 1 to never finish until I stripped languages with non-latin charsets from my training data.

I guess that epoch 1 will finish tonight or tomorrow; then I will test the accelerator code.

RuntimeRacer commented 1 year ago

@lifeiteng I did a first couple of test and tries the last 2 hours, however I am hitting kind of a wall when it comes to splitting the dataloaders across the GPUs. Accelerate makes assumptions like fixed batch sizes and known amount of elements in the dataset in the preparation step; however Lhotse uses it's custom Implementations to feed in data dynamically. I can't continue looking into this rn; but let me know in case you have any suggestions what we could do here.

RuntimeRacer commented 1 year ago

I was able to fix the existing DDP implementation: https://github.com/lifeiteng/vall-e/pull/116 Not sure if Accelerator is still feasible since it seems like a lot more work.