Airis-VtuberAI is a open source attempt to recreate the populer Vtuber "Neuro Sama". The project utilises no APIs and can run entirely localy without a need for an internet connection or considerable Vram.
the project includes the ability to transcribe the users voice, generate a response, and synthisise a text2speach output with as litle latency as resonable posible while sacraphising as little quality as posible.
tutorial (this is now outdated but may still help some)
first clone this repository and then clone the OpenVoice TTS repository
git clone https://github.com/neurokitti/AIRIS-VtuberAI.git
cd AIRIS-VtuberAI
git clone https://github.com/myshell-ai/OpenVoice.git
next create a .venv and install install the requirments.txt (the one from this repo not the OpenVoice repo)
pip install -r requirements.txt
next install pytorch here next you can deleat all the files (not the folders) in the OpenVoice folder. then drag the files from the Vtuber Project into the open voice repository. dont drag the system prompt files into the repo though.
finnaly install OBS Websocket here and set the websocket pasword to the be the same as the one in the startup_scripts.py file.
To run this project you can simply run the main file. to run interview mode just uncoment it.
from startup_scripts import main_chat, main_interview
if __name__ == "__main__":
main_chat() #this will run a chat mode that will interact with the chat but will not respond to you
#main_interview() # this will not read chat but instead respond to anyone on the stream over mic
you may also want to edit the project to better suit your needs. in that case navigate to the startup_scripts.py file.
finnaly to run the project run the main.py file with the mode you want uncomented
UPDATE: i tested this on a GTX 745 (4 gigs VRAM) and had about 7 seconds of delay. The Metrics in this section include the full project including the overhead from running OBS, and Vtube Studio. All of these test were run on GPU and used the phi 3 mini 4k instruct model from microsoft.
NOTE: Because I have fully tested response time for reference its between 1 and 2 seconds
Whisper Model | Precision | Language Model | Quantization | Max. GPU memory | Response Time |
---|---|---|---|---|---|
tiny | int8_float16 | Phi-3-mini-4k-instruct | 4-bit | tbd | time tbd |
tiny | int8_float16 | Phi-3-mini-4k-instruct | 8-bit | tbd | time tbd |
tiny | int8_float16 | Phi-3-mini-4k-instruct | full | tbd | time tbd |
distil-large-v3 | int8_float16 | Phi-3-mini-4k-instruct | 4-bit | tbd | time tbd |
distil-large-v3 | int8_float16 | Phi-3-mini-4k-instruct | 8-bit | tbd | time tbd |
distil-large-v3 | int8_float16 | Phi-3-mini-4k-instruct | full | tbd | time tbd |
Executed with CUDA 12.1 on a NVIDIA Laptop RTX 4080 with 12 GB of VRAM.
Whisper Model | Precision | Language Model | Quantization | Max. GPU memory | Response Time |
---|---|---|---|---|---|
tiny | int8_float16 | Phi-3-mini-4k-instruct | 4-bit | tbd | time tbd |
tiny | int8_float16 | Phi-3-mini-4k-instruct | 8-bit | tbd | time tbd |
tiny | int8_float16 | Phi-3-mini-4k-instruct | full | tbd | time tbd |
distil-large-v3 | int8_float16 | Phi-3-mini-4k-instruct | 4-bit | tbd | time tbd |
distil-large-v3 | int8_float16 | Phi-3-mini-4k-instruct | 8-bit | tbd | time tbd |
distil-large-v3 | int8_float16 | Phi-3-mini-4k-instruct | full | tbd | time tbd |
Executed with CUDA 12.1 on a NVIDIA Laptop RTX 4080 with 12 GB of VRAM.
idk how to do A license but all projects used in this use MIT so i think you can do whatver you want cuz i dont care. go nuts
neurokitti42@gmail.com