neonbjb / tortoise-tts

A multi-voice TTS system trained with an emphasis on quality
Apache License 2.0
12.39k stars 1.74k forks source link

Beginner questions about Tortoise TTS and cloning from audio clips #517

Open jackpotcityco opened 12 months ago

jackpotcityco commented 12 months ago

Hello,

I have some beginner questions to get a basic understanding for voice cloning from audio clips.

I could read this that a NVIDIA GPU must be used: "If you want to use this on your own computer, you must have an NVIDIA GPU."

I am looking for some idéa of time it takes for below questions to get an idéa. In this moment, I have no clue how long time "slow" is.

  1. My question is if Tortoise TTS will work on a CPU anyway, just that it will be slower?
  2. I have 20 CPU cores with 4.8 GHZ. Will the Tortoise be able to use all CPU cores?
  3. If CPU(s) are possible to use how much slower does it get than a GPU in general to get an idéa. 10 times slower or 100 times slower?
  4. If I have 100 audio clips where each clip is 15 seconds long. How long time would it take approx. to train those with best quality with a GPU or in my case 20 CPU cores, 4.8 GHZ?
  5. When training is complete and model is produced. How long time does it take to generate Audio from for example: 100 words on in my case 20 CPU cores, 4.8 GHZ?
  6. Should I use the fast version, "Tortoise TTS FAST" instead and importantly will that version work on my CPU cores? https://github.com/152334H/tortoise-tts-fast

Many thanks!

madwurmz commented 12 months ago

I have simular beginner question, I wonder how much VRAM is needed minimal to clone a voice with good quality?

Tortoise TTS FAST is not available for Windows so I can't use it, but it offers improvements worth having. Or has this repo also updated with the speed improvements? 🗣️

NikitaKononov commented 11 months ago

I have simular beginner question, I wonder how much VRAM is needed minimal to clone a voice with good quality?

Tortoise TTS FAST is not available for Windows so I can't use it, but it offers improvements worth having. Or has this repo also updated with the speed improvements? 🗣️

Not available for Windows? python, ffmpeg, pip/poetry are available for Windows. What's the problem?

NikitaKononov commented 11 months ago

Hello,

I have some beginner questions to get a basic understanding for voice cloning from audio clips.

I could read this that a NVIDIA GPU must be used: "If you want to use this on your own computer, you must have an NVIDIA GPU."

I am looking for some idéa of time it takes for below questions to get an idéa. In this moment, I have no clue how long time "slow" is.

  1. My question is if Tortoise TTS will work on a CPU anyway, just that it will be slower?
  2. I have 20 CPU cores with 4.8 GHZ. Will the Tortoise be able to use all CPU cores?
  3. If CPU(s) are possible to use how much slower does it get than a GPU in general to get an idéa. 10 times slower or 100 times slower?
  4. If I have 100 audio clips where each clip is 15 seconds long. How long time would it take approx. to train those with best quality with a GPU or in my case 20 CPU cores, 4.8 GHZ?
  5. When training is complete and model is produced. How long time does it take to generate Audio from for example: 100 words on in my case 20 CPU cores, 4.8 GHZ?
  6. Should I use the fast version, "Tortoise TTS FAST" instead and importantly will that version work on my CPU cores? https://github.com/152334H/tortoise-tts-fast

Many thanks!

  1. It will work on CPU but it'll take tons of time
  2. Try it and you'll find it out
  3. Comparing RTX 4090 vs Ryzen 5900X - 45x slower
  4. 100 audio clips are nothing for the model like Tortoise. You need tens of thousands of hours. Read the description carefully. Training codes are not provided. You won't be able to train Tortoise on CPU, again, read the description carefully
  5. You can try it on your CPU and tell us the results
  6. You can try both original and fast versions and choose the most suitable for you
manmay-nakhashi commented 11 months ago

This repo is updated with speed enhancements , please check the readme.

madwurmz commented 11 months ago

Tortoise TTS FAST is not available for Windows Not available for Windows? python, ffmpeg, pip/poetry are available for Windows. What's the problem?

That is what I've read and indeed there is only .sh , no .bat , for beginners maybe not that clear but .sh is for linux .bat is for windows. 🗣️

NikitaKononov commented 11 months ago

Tortoise TTS FAST is not available for Windows Not available for Windows? python, ffmpeg, pip/poetry are available for Windows. What's the problem?

That is what I've read and indeed there is only .sh , no .bat , for beginners maybe not that clear but .sh is for linux .bat is for windows. 🗣️

Did you read README.md in this repo? CLI Usage paragraph is mentioning only python script ./script/tortoise-tts.py

The best advice for beginners is to read project descriptions carefully

test.sh looks like rudiment in that repo. By the way, you can use .sh scripts on Windows with Git Bash

zeynabyousefi commented 3 weeks ago

I want to separate good and bad hands, can I use this? @NikitaKononov