voicepaw / so-vits-svc-fork

so-vits-svc fork with realtime support, improved interface and more features.
Other
8.8k stars 1.18k forks source link

Availability of the Colab notebook #1168

Open 34j opened 6 months ago

34j commented 6 months ago

Many issues have been opened, but I think the notebook is just barely working, although it is rubbish.

Describe the bug

  1. f-strings in commands no longer working
  2. 1064 might be reducing the the simplicity unfortunately

  3. Logs are not displayed
  4. Annoying popup: #1163

To Reproduce

Run notebook

Additional context

No response

Version

2024/05/10

Platform

Google Colab

Code of Conduct

No Duplicate

xD0135 commented 1 month ago

I had the same experience trying to run the provided Colab notebook and decided on a simpler approach, as shown below. Each code block here represents a code block that you add in Colab, and I'm focusing only on training and not including inference, as the original Colab notebook has. Also, make sure that you select a GPU instance, otherwise this won't work.

#@title Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')
#@title Install Dependencies
!python -m pip install so-vits-svc-fork
#@title Verify Dependencies
!svc --help

For this next step, I created a folder MyDrive/TTS/sovits in my Google Drive, hence it listed in the path. Inside the folder I then added my training data in the expected location dataset_raw/{speaker_id}/**/{wav_file}.{any_format} before running the following:

#@title Generate Config
!cd /content/drive/MyDrive/TTS/sovits && svc pre-resample && svc pre-config

This will create a few folders, and because I'm running on the free T4 Colab instance, I optimized my config file located in my Google Drive under MyDrive/TTS/sovits/configs/44k/config.json with the following (only showing the modified lines)

{
  "train": {
    "epochs": 201,
    "batch_size": 16
  }
}

If you made changes to the config file, save/upload them before proceeding. It's worth noting that using 16 as the batch_size keeps the VRAM stable at around 11.9 / 15.0 GB. Anything over will cause it to run OOM.

#@title Generate Hubert
!cd /content/drive/MyDrive/TTS/sovits && svc pre-hubert
#@title Start Training
!cd /content/drive/MyDrive/TTS/sovits && svc train

And that's it! For the training step, if you want to load the TensorBoard so that you have a visual representation of the training, you can use the following instead of the previous:

#@title Start Training
%load_ext tensorboard
%tensorboard --logdir /content/drive/MyDrive/TTS/sovits/logs/44k
!cd /content/drive/MyDrive/TTS/sovits && svc train

For my dataset with about 13 minutes of training audio, it took just over an hour to complete. And now that I understand how long it takes to run, I can increase the epochs parameter to match my desired tradeoff of training data quality. Hope it helps!