jaywalnut310 / vits

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
https://jaywalnut310.github.io/vits-demo/index.html
MIT License
6.91k stars 1.27k forks source link

Computing power requirement #159

Open JoyceMind opened 1 year ago

JoyceMind commented 1 year ago

May I ask what is the minimum machine power configuration requirement for training vits and reasoning? Unfortunately,I only have 2060, 6GB of graphics memory in my laptop.

vidigal commented 1 year ago

I'm training a model in Brazilian Portuguese using RTX 3060 12gb. I'm running the training for 5 days. I can understand what the generated voice is speaking. However the generated voice does not have a good quality... yet

ctimict commented 1 year ago

I'm training a model in Brazilian Portuguese using RTX 3060 12gb. I'm running the training for 5 days. I can understand what the generated voice is speaking. However the generated voice does not have a good quality... yet

Hello, I have been training my model for 3 days, but my computer has had a power outage. I'm struggling to know how to train to continue from

2023-09-27 17:04:19,901 vietnamese_base INFO [2.2566256523132324, 2.8984131813049316, 5.876196384429932, 24.848127365112305, 1.80074548 72131348, 2.167569398880005, 32800, 0.00019621110994425385]
2023-09-27 17:04:27,901 vietnamese_base INFO ====> Epoch: 154

can you help me ? thank you !

aaronnewsome commented 11 months ago

I'm using a similar laptop with RTX 3060 6GB. Using 1150 sentences (wavs), it took around 15 days for 10,000 epochs. It was a good enough test to know that I can get decent quality. Some words have strange pronunciations and the speaking cadence seems odd at times. Planning for another run with at least 2,000 sentences but will likely not run it on the 3060, takes too long. I bought 3 of the Tesla P40 24GB cards and the training seems much faster on those.