2) Medium model with command (since dialogues and music is sort of mixed up):
stable-ts test.mp4 --model medium --model_dir D:***els\ --output testmedium.srt --language Japanese --vad True --vad_threshold 0.35 --demucs True --refine
3) Medium model using just:
stable-ts test.mp4 --model medium --model_dir D:***els\ --output testmedium.srt --language Japanese
The result is:
A) Large-v3 wins (not only in quality but also has best sync),
B) Second place (strangely command no. 3 outputs better) and
C) Last is option 2 (command taking more time)??
The transcribes were done in native language i.e Japanese and the converted to english for comparisons using Google API Version 1
I have tested the video: https://www.youtube.com/watch?v=1NfFIpZocWs against models:
1) Large-v3
2) Medium model with command (since dialogues and music is sort of mixed up): stable-ts test.mp4 --model medium --model_dir D:***els\ --output testmedium.srt --language Japanese --vad True --vad_threshold 0.35 --demucs True --refine
3) Medium model using just: stable-ts test.mp4 --model medium --model_dir D:***els\ --output testmedium.srt --language Japanese
The result is: A) Large-v3 wins (not only in quality but also has best sync), B) Second place (strangely command no. 3 outputs better) and C) Last is option 2 (command taking more time)??
The transcribes were done in native language i.e Japanese and the converted to english for comparisons using Google API Version 1