Hello, everyone. Recently, I got stumped by this seemingly small thing.
I wanted to have a video in German transcribed in its original language but noticed that stable-ts was auto-detecting the language as being English. More specifically it translates rather than transcribes everything. Whisper seems to detect it accurately somehow.
I played around with it a bit, specifying the task, switching between "--language de" and "language German", changing the model from v2 to v1 with no success until I noticed it's because of the silent audio at the start that goes more than 30 seconds. So I did a test, ripped the first few lines of dialogue from the video, and sure enough it was detected as German.
My prompt was simply: !stable-ts "Test.mp4" -o "Test.srt" --task transcribe --language German --model large-v2 --word_level=False
Is there something that makes stable force auto-detect the language even if you've specified it if there is no audio in the first 30 seconds? Or am I missing something here.
Hello, everyone. Recently, I got stumped by this seemingly small thing.
I wanted to have a video in German transcribed in its original language but noticed that stable-ts was auto-detecting the language as being English. More specifically it translates rather than transcribes everything. Whisper seems to detect it accurately somehow.
I played around with it a bit, specifying the task, switching between "--language de" and "language German", changing the model from v2 to v1 with no success until I noticed it's because of the silent audio at the start that goes more than 30 seconds. So I did a test, ripped the first few lines of dialogue from the video, and sure enough it was detected as German. My prompt was simply:
!stable-ts "Test.mp4" -o "Test.srt" --task transcribe --language German --model large-v2 --word_level=False
Is there something that makes stable force auto-detect the language even if you've specified it if there is no audio in the first 30 seconds? Or am I missing something here.