Add new state of the art voice seperation models (UVR)

jhj0517 / Whisper-WebUI

A Web UI for easy subtitle using whisper model.

Apache License 2.0

1.39k stars 195 forks source link

Open montvid opened 5 hours ago

montvid commented 5 hours ago

Thanks for showing that in order to do exact transcription one needs to use a voice seperation model (UVR) and a silence cutting model (VAD) and only afterwards transcribe with Whisper. Found a list of recent state of the art voice seperation models here: https://github.com/ZFTurbo/Music-Source-Separation-Training?tab=readme-ov-file#vocal-models Experimenting with this one now: https://github.com/KimberleyJensen/Mel-Band-Roformer-Vocal-Model

jhj0517 commented 4 hours ago

Thanks for noticing, I'll definitely take a look at it when I have time!