voicepaw / so-vits-svc-fork

so-vits-svc fork with realtime support, improved interface and more features.
Other
8.66k stars 1.15k forks source link
contentvec deep-learning gan hacktoberfest hubert lightning pytorch pytorch-lightning realtime so-vits-svc softvc sovits speech-synthesis vits voice-changer voice-conversion

SoftVC VITS Singing Voice Conversion Fork

简体中文

CI Status Documentation Status Test coverage percentage

Poetry black pre-commit

PyPI Version Supported Python versions License

A fork of so-vits-svc with realtime support and greatly improved interface. Based on branch 4.0 (v1) (or 4.1) and the models are compatible. 4.1 models are not supported. Other models are also not supported.

No Longer Maintained

Reasons

Alternatives

Always beware of the very few influencers who are quite overly surprised about any new project/technology. You need to take every social networking post with semi-doubt.

The voice changer boom that occurred in 2023 has come to an end, and many developers, not just those in this repository, have been not very active for a while.

There are too many alternatives to list here but:

Elsewhere, several start-ups have improved and marketed voice changers (probably for profit).

Updates to this repository have been limited to maintenance since Spring 2023. It is difficult to narrow the list of alternatives here, but please consider trying other projects if you are looking for a voice changer with even better performance (especially in terms of latency other than quality). > However, this project may be ideal for those who want to try out voice conversion for the moment (because it is easy to install).

Features not available in the original repo

Installation

Option 1. One click easy installation

Download .bat

This BAT file will automatically perform the steps described below.

Option 2. Manual installation (using pipx, experimental)

1. Installing pipx

Windows (development version required due to pypa/pipx#940):

py -3 -m pip install --user git+https://github.com/pypa/pipx.git
py -3 -m pipx ensurepath

Linux/MacOS:

python -m pip install --user pipx
python -m pipx ensurepath

2. Installing so-vits-svc-fork

pipx install so-vits-svc-fork --python=3.11
pipx inject so-vits-svc-fork torch torchaudio --pip-args="--upgrade" --index-url=https://download.pytorch.org/whl/cu121 # https://download.pytorch.org/whl/nightly/cu121

Option 3. Manual installation

Creating a virtual environment Windows: ```shell py -3.11 -m venv venv venv\Scripts\activate ``` Linux/MacOS: ```shell python3.11 -m venv venv source venv/bin/activate ``` Anaconda: ```shell conda create -n so-vits-svc-fork python=3.11 pip conda activate so-vits-svc-fork ``` Installing without creating a virtual environment may cause a `PermissionError` if Python is installed in Program Files, etc.

Install this via pip (or your favourite package manager that uses pip):

python -m pip install -U pip setuptools wheel
pip install -U torch torchaudio --index-url https://download.pytorch.org/whl/cu121 # https://download.pytorch.org/whl/nightly/cu121
pip install -U so-vits-svc-fork
Notes - If no GPU is available or using MacOS, simply remove `pip install -U torch torchaudio --index-url https://download.pytorch.org/whl/cu121`. MPS is probably supported. - If you are using an AMD GPU on Linux, replace `--index-url https://download.pytorch.org/whl/cu121` with `--index-url https://download.pytorch.org/whl/nightly/rocm5.7`. AMD GPUs are not supported on Windows ([#120](https://github.com/voicepaw/so-vits-svc-fork/issues/120)).

Update

Please update this package regularly to get the latest features and bug fixes.

pip install -U so-vits-svc-fork
# pipx upgrade so-vits-svc-fork

Usage

Inference

GUI

GUI

GUI launches with the following command:

svcg

CLI

svc vc
svc infer source.wav

Pretrained models are available on Hugging Face or CIVITAI.

Notes

Training

Before training

Cloud

Open In Colab Open In Paperspace Paperspace Referral[^p]

If you do not have access to a GPU with more than 10 GB of VRAM, the free plan of Google Colab is recommended for light users and the Pro/Growth plan of Paperspace is recommended for heavy users. Conversely, if you have access to a high-end GPU, the use of cloud services is not recommended.

[^p]: If you register a referral code and then add a payment method, you may save about $5 on your first month's monthly billing. Note that both referral rewards are Paperspace credits and not cash. It was a tough decision but inserted because debugging and training the initial model requires a large amount of computing power and the developer is a student.

Local

Place your dataset like dataset_raw/{speaker_id}/**/{wav_file}.{any_format} (subfolders and non-ASCII filenames are acceptable) and run:

svc pre-resample
svc pre-config
svc pre-hubert
svc train -t

Notes

Further help

For more details, run svc -h or svc <subcommand> -h.

> svc -h
Usage: svc [OPTIONS] COMMAND [ARGS]...

  so-vits-svc allows any folder structure for training data.
  However, the following folder structure is recommended.
      When training: dataset_raw/{speaker_name}/**/{wav_name}.{any_format}
      When inference: configs/44k/config.json, logs/44k/G_XXXX.pth
  If the folder structure is followed, you DO NOT NEED TO SPECIFY model path, config path, etc.
  (The latest model will be automatically loaded.)
  To train a model, run pre-resample, pre-config, pre-hubert, train.
  To infer a model, run infer.

Options:
  -h, --help  Show this message and exit.

Commands:
  clean          Clean up files, only useful if you are using the default file structure
  infer          Inference
  onnx           Export model to onnx (currently not working)
  pre-classify   Classify multiple audio files into multiple files
  pre-config     Preprocessing part 2: config
  pre-hubert     Preprocessing part 3: hubert If the HuBERT model is not found, it will be...
  pre-resample   Preprocessing part 1: resample
  pre-sd         Speech diarization using pyannote.audio
  pre-split      Split audio files into multiple files
  train          Train model If D_0.pth or G_0.pth not found, automatically download from hub.
  train-cluster  Train k-means clustering
  vc             Realtime inference from microphone

External Links

Video Tutorial

Contributors ✨

Thanks goes to these wonderful people (emoji key):

34j
34j

💻 🤔 📖 💡 🚇 🚧 👀 ⚠️ 📣 🐛
GarrettConway
GarrettConway

💻 🐛 📖 👀
BlueAmulet
BlueAmulet

🤔 💬 💻 🚧
ThrowawayAccount01
ThrowawayAccount01

🐛
緋

📖 🐛
Lordmau5
Lordmau5

🐛 💻 🤔 🚧 💬 📓
DL909
DL909

🐛
Satisfy256
Satisfy256

🐛
Pierluigi Zagaria
Pierluigi Zagaria

📓
ruckusmattster
ruckusmattster

🐛
Desuka-art
Desuka-art

🐛
heyfixit
heyfixit

📖
Nerdy Rodent
Nerdy Rodent

📹
谢宇
谢宇

📖
ColdCawfee
ColdCawfee

🐛
sbersier
sbersier

🤔 📓 🐛
Meldoner
Meldoner

🐛 🤔 💻
mmodeusher
mmodeusher

🐛
AlonDan
AlonDan

🐛
Likkkez
Likkkez

🐛
Duct Tape Games
Duct Tape Games

🐛
Xianglong He
Xianglong He

🐛
75aosu
75aosu

🐛
tonyco82
tonyco82

🐛
yxlllc
yxlllc

🤔 💻
outhipped
outhipped

🐛
escoolioinglesias
escoolioinglesias

🐛 📓 📹
Blacksingh
Blacksingh

🐛
Mgs. M. Thoyib Antarnusa
Mgs. M. Thoyib Antarnusa

🐛
Exosfeer
Exosfeer

🐛 💻
guranon
guranon

🐛 🤔 💻
Alexander Koumis
Alexander Koumis

💻
acekagami
acekagami

🌍
Highupech
Highupech

🐛
Scorpi
Scorpi

💻
Maximxls
Maximxls

💻
Star3Lord
Star3Lord

🐛 💻
Forkoz
Forkoz

🐛 💻
Zerui Chen
Zerui Chen

💻 🤔
Roee Shenberg
Roee Shenberg

📓 🤔 💻
Justas
Justas

🐛 💻
Onako2
Onako2

📖
4ll0w3v1l
4ll0w3v1l

💻
j5y0V6b
j5y0V6b

🛡️
marcellocirelli
marcellocirelli

🐛
Priyanshu Patel
Priyanshu Patel

💻
Anna Gorshunova
Anna Gorshunova

🐛 💻

This project follows the all-contributors specification. Contributions of any kind welcome!