Open montvid opened 5 hours ago
Thanks for showing that in order to do exact transcription one needs to use a voice seperation model (UVR) and a silence cutting model (VAD) and only afterwards transcribe with Whisper. Found a list of recent state of the art voice seperation models here: https://github.com/ZFTurbo/Music-Source-Separation-Training?tab=readme-ov-file#vocal-models Experimenting with this one now: https://github.com/KimberleyJensen/Mel-Band-Roformer-Vocal-Model
Thanks for noticing, I'll definitely take a look at it when I have time!
Thanks for showing that in order to do exact transcription one needs to use a voice seperation model (UVR) and a silence cutting model (VAD) and only afterwards transcribe with Whisper. Found a list of recent state of the art voice seperation models here: https://github.com/ZFTurbo/Music-Source-Separation-Training?tab=readme-ov-file#vocal-models Experimenting with this one now: https://github.com/KimberleyJensen/Mel-Band-Roformer-Vocal-Model