Closed kanjieater closed 1 year ago
1.zip Managed to reproduce easily with a smaller test file:
~~ Transcribing VAD chunk: (19:18.106 --> 19:41.140) ~~
[00:00.000 --> 00:09.660] ในใฏใผใซใง้ฃในใใใใซไฝใฃใใๅผๅฝใฏใใใๅฎถใง้ฃในใใฎใญ ใใ็ฝฎใใฆใใใใ้ฃในใใใใใชใ้ฃในใฆ
[00:09.660 --> 00:15.680] ๅฟใฎ็ฎใ่ฆใ ่ชๅใฎๆใฎๆฏๅบฆใๅงใใ
[00:15.680 --> 00:22.200] ใ็ถใใใใใฆใใใใ ๅฐใใฏใใฐใฃใฆใใใใใใใใชใใฎใซ
[00:22.200 --> 00:24.200] ่ฆใใใชใฃใใ
~~ Transcribing VAD chunk: (19:43.385 --> 20:12.747) ~~
[00:00.000 --> 00:15.840] ๅๅใใฎไธก่ฆชใฎใใกใใ็ถใใใฎไผ็คพใฎๆนใ้ๅคใใใฎใซ้ ใใใใใใฎๅๆใๆฉใใๅฟใ่ตทใใ้ ใซใฏใใใใใชใใใจใใปใจใใฉใ ใ
[00:16.960 --> 00:23.840] ใใฎใพใพใงใใใจๆใใใใใใใใชใใใใ้ปใฃใใพใพ้ๆฎตใไธใใ
[00:23.840 --> 00:29.840] ่ๅพใใ่ฟฝใ่จใกใฎใใใซใใๆฏใ่ใใใใ
Performing alignment...
Failed to align segment ("ใฟใใชใฎ็ฅใใชใใจใใใงใ็งใใกใฏใใใใๅ้ใ็งใซใ็นๅฅใชใใจใไฝใซใใชใใฆใใ็งใใ้ๅ็ฅ็ตใ็นๅฅ่ฏใใชใใฆใใ้ ญใ่ฏใใชใใฆใใ็งใซใใฟใใชใ็พจใพใใใใใใช้ทๆใใๆฌๅฝใซไฝใซใใชใใฆใ"): no characters in this segment found in model dictionary, resorting to original...
Failed to align segment ("ๅฝ้ๆฒฟใใฎในใผใใผใพใงใฏ่ท้ขใใใฃใฆใ่ปใใชใใใฐใชใใชใ่กใใชใใใใใๅฟใฎๅฐใใ้ ใใใ้ฑใซไธๅบฆใใใกใฎ่ฃใซใใๅ
ฌๅใซไธๆฒณ่ฃฝ่ใฎ่ปใใใฃใฆใใใ่ฟๆใซไฝใใๅนดๅฏใใใๅฐใใชๅญไพใ้ฃใใใๆฏใใใใใใฎๆฒใ่ใใฆ่ฒทใ็ฉใซใใฃใฆใใใ"): no characters in this segment found in model dictionary, resorting to original...
Failed to align segment ("ๅคงใใช้ณๆฅฝใ้ฟใใใในใใผใซใผใใใใใใจ่ฆๆ
ใ่จใไบบใใใฆใ้จ้ณๅ้กใซใชใฃใฆใใใใจใใ้จ้ณโฆใจใพใงใฏๆใใชใใใฉใๅฟใใใฎ้ณใ่ใใจใๅฑ
ๅใชใใไปใๅนณๆฅใฎๆผ้ใ ใจใใใใจใๆ่ญใใใๆ่ญใใใใใฆใใพใใ"): no characters in this segment found in model dictionary, resorting to original...
Failed to align segment ("ๅญไพใ็ฌใๅฃฐใ่ใใใใๅนณๆฅๅๅไธญใฎๅไธๆใจใใใฎใใใใใใๆ้ใชใใ ใจใใใใจใๅฟใฏใๅญฆๆ กใไผใใใใซใชใฃใฆๅใใฆ็ฅใฃใใไธๆฒณๆๅฎถใฎ่ปใฏๅฟใซใจใฃใฆๅฐๅญฆๆ กใฎ้ ใใๅคไผใฟใๅฌไผใฟใซ่ฆใใใใใฎใ ใฃใใ"): no characters in this segment found in model dictionary, resorting to original...
Failed to align segment ("ใใใช้ขจใซใซใผใใณใๆทใใฆใ้จๅฑใง่บซใๅบใใใฆใใๅนณๆฅใซ่ฆใใใฎใงใฏใชใใฃใใๅปๅนดใพใงใฏใๅฟใฏๆฏใๆฎบใใฆใ้ณใ็ตใฃใใใฌใใ่ฆใชใใใใใฎๆใใใๅคใซๆผใใฆใใชใใใฐใใใชใ"): no characters in this segment found in model dictionary, resorting to original...
Failed to align segment ("ไธๆฒณ่็ซใๆฅใชใใฆใใๅฟใฎ้จๅฑใฎๅใใใซ่ฆใใๅ
ฌๅใซใฏใใใคใ่ฟๆใฎ่ฅใใๆฏใใใใกใๅญไพใ้ใฐใใซๆฅใฆใใใ่ฒใจใใฉใใฎใใใฐใใใณใใซใฎใจใใใซใใใใใใผใซใผใใใณใใฎใใฐใซไธฆใใงใใใฎใ่ฆใใจใใใๅๅไธญใใใจใกใใฃใจใ ใใจๆใใ"): no characters in this segment found in model dictionary, resorting to original...
Failed to align segment ("ๅๆใใๅไธๆใใใใซใใใฆ้ใพใๅงใใ่ฆชๅญใใกใฏใๅไบๆใซใฏใๆผใ้ฃฏใฎใใใซใฟใใชไธๆฆใใใใใใชใใชใใใใใใใใๅฐใใซใผใใณใ้ใใใใใใซใผใใณใฎๅธๅฐใฎๆทกใใชใฌใณใธ่ฒใ้ใใๆผใงใใใใใ ใใใซใชใฃใ้จๅฑใฏใใฃใจ้ใใใฆใใใจใ"): no characters in this segment found in model dictionary, resorting to original...
Failed to align segment ("ๅ
จ้จใใใใใใปใใใใ็็ฑใใใกใใจใใใๆใฏใซใผใใณใ้ใใชใใใใ ใจใใๅญฆๆ กใซใฏๅญไพใฏใฟใใช่กใใชใใใฐใชใใชใใใ ใจใใใใจใจใใใๆฏใใใจ่ฆๅญฆใซ่กใฃใในใฏใผใซใซใไปๆฅใใๆฌๅฝใซ่กใใๆฐใใใฆใใใใ ใใฉโฆ"): no characters in this segment found in model dictionary, resorting to original...
Failed to align segment ("ๆ่ตทใใใใใกใ ใฃใใใใคใใฎใใใซใ่
นใ็ใใใใณใใใใใชใใๆฌๅฝใซ็ใใใฉใใใฆใใใใใชใใฃใใๆใๅญฆๆ กใซ่กใๆ้ใซใชใใจใใใณใใใใใชใใฎใซใๆฌๅฝใซใ่
นใๆใซใฏ้ ญใ็ใใชใใฎใ ใ็ก็ใใชใใฆใใใ"): no characters in this segment found in model dictionary, resorting to original...
Failed to align segment ("ใใใใใซใฏใจใใผในใใ็จๆใใฆใใใๆฏใใใใๅฟใฎๅฃฐใ่ใใฆ้ฒ้ชจใซ่กจๆ
ใใชใใใใ้ปใฃใใๅฟใ่ฆใชใใใพใใงๅฟใฎๅฃฐใ่ใใใชใใฃใใใใซไฟฏใใฆใๆนฏๆฐใ็ซใฆใใใฐใซใใใ้ฃๅใซ้ใถใใใฎใพใพใใใใใใใใใใชๅฃฐใโฆ"): no characters in this segment found in model dictionary, resorting to original...
Failed to align segment ("ในใฏใผใซใฏๅญฆๆ กใใใชใใฎใใๆฏๆฅใใใชใใใๆฅใฆใไบบๆฐใๅญฆๆ กใใๅฐใชใใใๅ
็ใ่ฏใไบบใใใ ใฃใใงใใใใ่กใใฃใฆๅฟใ่จใฃใใใงใใใใใฉใใใใฎ?่กใใชใใฎ?ใใคใใฎใใคใซ่ฒฌใใใใใใใซ่จใใใใจใใใใใๆฏใใใฏ่จใฃใฆๆฌฒใใใใ ใใจใใใใใ ใใฉใ้ใใ่กใใใใชใใใใใชใใฎใ"): no characters in this segment found in model dictionary, resorting to original...
Failed to align segment ("ใใณใใใใใชใใใปใใจใใซใใชใใใใใใใใใใใใใใชใใงใใใจใใใใใใใใใใใใใใใใซใใใ
ใใซใจใใใใซใใ ใใใใฏใใใใใใใชใใใใใใจใใใใใกใใใใใใฉใใใใฎ?ใ่ถณใๅบใพใฃใใใใซใชใฃใฆๅใใชใ"): no characters in this segment found in model dictionary, resorting to original...
Failed to align segment ("ไปๆฅใฏ่กใใชใใใฉใๆฌกใซในใฏใผใซใใใๆฅใซใพใใ่
นใ็ใใชใใใฉใใใชใใฆใใใใชใใใใณใใใใใชใใฆใๆฌๅฝใซ็ใใใใใ ่กใใชใใ ใใชใฎใซใใใใช็ไธๅฐฝใชใใจใ่ใใใใชใใฆใใจๆฒใใใชใฃใฆใใใ็ญใใชใใพใพใๆฏใใใ่ฆใฆใใใจใใๆฏใใใ"): no characters in this segment found in model dictionary, resorting to original...
Failed to align segment ("ๅฐๆใซใใซใฏใฎๆนฏๆฐใใตใใฃใจๅคงใใไธใใฃใฆใใใใซๆฐด้ณใจใจใใซๆถใใใๆฌๅฝใฏๅพใง้ฃในใใใจๆใฃใฆใใใใฉใ็ญใใๆใใชใใฃใใใใขใฎๅใงใใธใฃใๅงฟใฎใพใพๅใใชใๅฟใ็ก่ฆใใใใใซใใกใใฃใจใฉใใฆใใใจ้ใๆใใใๆฏใใใๅฅฅใฎใชใใณใฐใซๆถใใใ"): no characters in this segment found in model dictionary, resorting to original...
Failed to align segment ("ใใใซใฉใใใซ้ป่ฉฑใใๅฃฐใ่ใใใฆใใใใใใใใพใใใๅฎ่ฅฟใงใใใใฉใใใจใใใใพใงใฎไธๆฉๅซใๅฏใใใๆญใฃใใใใชใใใใใใฎๅฃฐใ่ใใใฆใใใใใใใใชใใงใใใ่
นใ็ใใจ่จใๅบใใฆ"): no characters in this segment found in model dictionary, resorting to original...
Failed to align segment ("็ณใ่จณใใใพใใใ่ฆๅญฆใฎๆใซใฏใใใฎๅญใฎๆนใ่กใใใใฃใฆไนใๆฐใ ใฃใใใงใใใฉใใฏใใใฏใใๆฌๅฝใซใ่ฟทๆใใใใใฆใใๆฏใใใใณใณใญใ้ฃใใฆ่กใฃใฆใใใในใฏใผใซใฏใใณใณใญใฎๆๅฎคใจใใใจใใใ ใฃใใๅ
ฅใๅฃใซๆใใฃใ็ๆฟใฎไธใซใๅญไพ่ฒๆๆฏๆดใ"): no characters in this segment found in model dictionary, resorting to original...
Failed to align segment ("ๅฟใฎ็ฌใใใณใจๆผใใใใใฎๅๅใใๅฟใฎๆๅฎคใชใฎใใใชใใ ใ็ณใ่จณใชใใฃใใๅฟใจๅใๅๅใใๆฏใใใ ใฃใฆๆฐใฅใใฆใใใ ใใใใๆฏใใใฏใใใใซ่ชๅใ้ฃใใฆใใใใใซใๅจใซใใฎๅๅใใคใใใใใใใชใใฎใซใ"): no characters in this segment found in model dictionary, resorting to original...
Failed to align segment ("่ธใใฎใฅใใจ็ใใ ใไธ็ปๆ กใจๅผใฐใใๅญไพใใๅญฆๆ กใฎไปใซ้ใๅ ดๆใใใใจใใใใจใใใณใณใญใฏ่ชๅใใใใชใฃใฆๅใใฆ็ฅใฃใใๅฐๅญฆๆ กใฎ้ ใใณใณใญใใกใฎใฏใฉในใงๅญฆๆ กใซๆฅใชใๅญใฏไธไบบใใใชใใฃใใ"): no characters in this segment found in model dictionary, resorting to original...
Failed to align segment ("ใฟใใชใๅคๅฐใฎใบใซไผใฟใฏ1ๆฅใ2ๆฅใใฆใใใใใใใชใใใฉใใจใซใใใใใใซๆฅใใใใชๅญใฏไธไบบใใใชใใฃใใในใฏใผใซใง่ฟใใฆใใใๅ
็ใใกใใใฟใใช่ชๅใใกใฎๅฟใฎๆๅฎคใในใฏใผใซใจๅผใใงใใใ"): no characters in this segment found in model dictionary, resorting to original...
Failed to align segment ("ๅญไพ็ช็ตใฎๆญใฎใๅงใใใฎใใใช้ฐๅฒๆฐใฎไบบใ ใฃใใ่ธใซใคใใใฒใพใใๅใฎๅๆญใซใ่ชฐใๅญไพใๆธใใใใใๅฝผๅฅณใฎไผผ้ก็ตตใจใๅๅณถใใจใใๅๅใๆธใใฆใใใใใฏใใใจ็ญใใๅฃฐใใๆใชใใๅฐใใไธๆ็ญใ ใฃใใ"): no characters in this segment found in model dictionary, resorting to original...
Failed to align segment ("ใใใซใ็ฎใใจใฆใๅชใใใๅฅฝๆใๆใฃใใใฉใใใฎไบบใไปใฏใใๅๆฅญใใฆใใใฎๅญฆๆ กใฎไธญๅญฆ็ใงใชใใใจใ้้ใใซ็พจใพใใใฃใใๅฟใฏใ้ชๅ้้ไธญใซ้ใฃใฆใใชใใฆใจใฆใ่จใใชใใใพใ ใๅ
ฅๅญฆใใใฐใใใ"): no characters in this segment found in model dictionary, resorting to original...
Failed to align segment ("ๅๅใใฎไธก่ฆชใฎใใกใใ็ถใใใฎไผ็คพใฎๆนใ้ๅคใใใฎใซ้ ใใใใใใฎๅๆใๆฉใใๅฟใ่ตทใใ้ ใซใฏใใใใใชใใใจใใปใจใใฉใ ใใใฎใพใพใงใใใจๆใใใใใใใใชใใใใ้ปใฃใใพใพ้ๆฎตใไธใใ่ๅพใใ่ฟฝใ่จใกใฎใใใซใใๆฏใ่ใใใใ"): no characters in this segment found in model dictionary, resorting to original...
Traceback (most recent call last):
File "/home/ke/.pyenv/versions/subgen/bin/whisperx", line 8, in <module>
sys.exit(cli())
File "/home/ke/.pyenv/versions/3.9.9/envs/subgen/lib/python3.9/site-packages/whisperx/transcribe.py", line 723, in cli
write_vtt(result_aligned["segments"], file=vtt)
File "/home/ke/.pyenv/versions/3.9.9/envs/subgen/lib/python3.9/site-packages/whisperx/utils.py", line 59, in write_vtt
f"{format_timestamp(segment['start'])} --> {format_timestamp(segment['end'])}\n"
File "/home/ke/.pyenv/versions/3.9.9/envs/subgen/lib/python3.9/site-packages/whisperx/utils.py", line 34, in format_timestamp
assert seconds >= 0, "non-negative timestamp expected"
AssertionError: non-negative timestamp expected
And it still outputs these vtt & txt files which seem broken (too short, misaligned, missing the whisper outputs, etc): ใใใฟใฎๅญคๅ.zip
I am pretty sure the reason why this still happens here is because, instead of being based on the official repo of whisper, this project holds copies of outdated code from the offical Whisper's project.
Seems like this problem has been solved as described here: #810 and here: #914
You can take a look at the fix here: Fix infinite loop caused by incorrect timestamp tokens prediction
Which is not presents in WhisperX's code -> whisperx/decoding.py
Code from WhisperX should be refactored in order to follow the original code base to avoid this type of problems.
1.zip Managed to reproduce easily with a smaller test file:
~~ Transcribing VAD chunk: (19:18.106 --> 19:41.140) ~~ [00:00.000 --> 00:09.660] ในใฏใผใซใง้ฃในใใใใซไฝใฃใใๅผๅฝใฏใใใๅฎถใง้ฃในใใฎใญ ใใ็ฝฎใใฆใใใใ้ฃในใใใใใชใ้ฃในใฆ [00:09.660 --> 00:15.680] ๅฟใฎ็ฎใ่ฆใ ่ชๅใฎๆใฎๆฏๅบฆใๅงใใ [00:15.680 --> 00:22.200] ใ็ถใใใใใฆใใใใ ๅฐใใฏใใฐใฃใฆใใใใใใใใชใใฎใซ [00:22.200 --> 00:24.200] ่ฆใใใชใฃใใ ~~ Transcribing VAD chunk: (19:43.385 --> 20:12.747) ~~ [00:00.000 --> 00:15.840] ๅๅใใฎไธก่ฆชใฎใใกใใ็ถใใใฎไผ็คพใฎๆนใ้ๅคใใใฎใซ้ ใใใใใใฎๅๆใๆฉใใๅฟใ่ตทใใ้ ใซใฏใใใใใชใใใจใใปใจใใฉใ ใ [00:16.960 --> 00:23.840] ใใฎใพใพใงใใใจๆใใใใใใใใชใใใใ้ปใฃใใพใพ้ๆฎตใไธใใ [00:23.840 --> 00:29.840] ่ๅพใใ่ฟฝใ่จใกใฎใใใซใใๆฏใ่ใใใใ Performing alignment... Failed to align segment ("ใฟใใชใฎ็ฅใใชใใจใใใงใ็งใใกใฏใใใใๅ้ใ็งใซใ็นๅฅใชใใจใไฝใซใใชใใฆใใ็งใใ้ๅ็ฅ็ตใ็นๅฅ่ฏใใชใใฆใใ้ ญใ่ฏใใชใใฆใใ็งใซใใฟใใชใ็พจใพใใใใใใช้ทๆใใๆฌๅฝใซไฝใซใใชใใฆใ"): no characters in this segment found in model dictionary, resorting to original... Failed to align segment ("ๅฝ้ๆฒฟใใฎในใผใใผใพใงใฏ่ท้ขใใใฃใฆใ่ปใใชใใใฐใชใใชใ่กใใชใใใใใๅฟใฎๅฐใใ้ ใใใ้ฑใซไธๅบฆใใใกใฎ่ฃใซใใๅ ฌๅใซไธๆฒณ่ฃฝ่ใฎ่ปใใใฃใฆใใใ่ฟๆใซไฝใใๅนดๅฏใใใๅฐใใชๅญไพใ้ฃใใใๆฏใใใใใใฎๆฒใ่ใใฆ่ฒทใ็ฉใซใใฃใฆใใใ"): no characters in this segment found in model dictionary, resorting to original... Failed to align segment ("ๅคงใใช้ณๆฅฝใ้ฟใใใในใใผใซใผใใใใใใจ่ฆๆ ใ่จใไบบใใใฆใ้จ้ณๅ้กใซใชใฃใฆใใใใจใใ้จ้ณโฆใจใพใงใฏๆใใชใใใฉใๅฟใใใฎ้ณใ่ใใจใๅฑ ๅใชใใไปใๅนณๆฅใฎๆผ้ใ ใจใใใใจใๆ่ญใใใๆ่ญใใใใใฆใใพใใ"): no characters in this segment found in model dictionary, resorting to original... Failed to align segment ("ๅญไพใ็ฌใๅฃฐใ่ใใใใๅนณๆฅๅๅไธญใฎๅไธๆใจใใใฎใใใใใใๆ้ใชใใ ใจใใใใจใๅฟใฏใๅญฆๆ กใไผใใใใซใชใฃใฆๅใใฆ็ฅใฃใใไธๆฒณๆๅฎถใฎ่ปใฏๅฟใซใจใฃใฆๅฐๅญฆๆ กใฎ้ ใใๅคไผใฟใๅฌไผใฟใซ่ฆใใใใใฎใ ใฃใใ"): no characters in this segment found in model dictionary, resorting to original... Failed to align segment ("ใใใช้ขจใซใซใผใใณใๆทใใฆใ้จๅฑใง่บซใๅบใใใฆใใๅนณๆฅใซ่ฆใใใฎใงใฏใชใใฃใใๅปๅนดใพใงใฏใๅฟใฏๆฏใๆฎบใใฆใ้ณใ็ตใฃใใใฌใใ่ฆใชใใใใใฎๆใใใๅคใซๆผใใฆใใชใใใฐใใใชใ"): no characters in this segment found in model dictionary, resorting to original... Failed to align segment ("ไธๆฒณ่็ซใๆฅใชใใฆใใๅฟใฎ้จๅฑใฎๅใใใซ่ฆใใๅ ฌๅใซใฏใใใคใ่ฟๆใฎ่ฅใใๆฏใใใใกใๅญไพใ้ใฐใใซๆฅใฆใใใ่ฒใจใใฉใใฎใใใฐใใใณใใซใฎใจใใใซใใใใใใผใซใผใใใณใใฎใใฐใซไธฆใใงใใใฎใ่ฆใใจใใใๅๅไธญใใใจใกใใฃใจใ ใใจๆใใ"): no characters in this segment found in model dictionary, resorting to original... Failed to align segment ("ๅๆใใๅไธๆใใใใซใใใฆ้ใพใๅงใใ่ฆชๅญใใกใฏใๅไบๆใซใฏใๆผใ้ฃฏใฎใใใซใฟใใชไธๆฆใใใใใใชใใชใใใใใใใใๅฐใใซใผใใณใ้ใใใใใใซใผใใณใฎๅธๅฐใฎๆทกใใชใฌใณใธ่ฒใ้ใใๆผใงใใใใใ ใใใซใชใฃใ้จๅฑใฏใใฃใจ้ใใใฆใใใจใ"): no characters in this segment found in model dictionary, resorting to original... Failed to align segment ("ๅ จ้จใใใใใใปใใใใ็็ฑใใใกใใจใใใๆใฏใซใผใใณใ้ใใชใใใใ ใจใใๅญฆๆ กใซใฏๅญไพใฏใฟใใช่กใใชใใใฐใชใใชใใใ ใจใใใใจใจใใใๆฏใใใจ่ฆๅญฆใซ่กใฃใในใฏใผใซใซใไปๆฅใใๆฌๅฝใซ่กใใๆฐใใใฆใใใใ ใใฉโฆ"): no characters in this segment found in model dictionary, resorting to original... Failed to align segment ("ๆ่ตทใใใใใกใ ใฃใใใใคใใฎใใใซใ่ นใ็ใใใใณใใใใใชใใๆฌๅฝใซ็ใใใฉใใใฆใใใใใชใใฃใใๆใๅญฆๆ กใซ่กใๆ้ใซใชใใจใใใณใใใใใชใใฎใซใๆฌๅฝใซใ่ นใๆใซใฏ้ ญใ็ใใชใใฎใ ใ็ก็ใใชใใฆใใใ"): no characters in this segment found in model dictionary, resorting to original... Failed to align segment ("ใใใใใซใฏใจใใผในใใ็จๆใใฆใใใๆฏใใใใๅฟใฎๅฃฐใ่ใใฆ้ฒ้ชจใซ่กจๆ ใใชใใใใ้ปใฃใใๅฟใ่ฆใชใใใพใใงๅฟใฎๅฃฐใ่ใใใชใใฃใใใใซไฟฏใใฆใๆนฏๆฐใ็ซใฆใใใฐใซใใใ้ฃๅใซ้ใถใใใฎใพใพใใใใใใใใใใชๅฃฐใโฆ"): no characters in this segment found in model dictionary, resorting to original... Failed to align segment ("ในใฏใผใซใฏๅญฆๆ กใใใชใใฎใใๆฏๆฅใใใชใใใๆฅใฆใไบบๆฐใๅญฆๆ กใใๅฐใชใใใๅ ็ใ่ฏใไบบใใใ ใฃใใงใใใใ่กใใฃใฆๅฟใ่จใฃใใใงใใใใใฉใใใใฎ?่กใใชใใฎ?ใใคใใฎใใคใซ่ฒฌใใใใใใใซ่จใใใใจใใใใใๆฏใใใฏ่จใฃใฆๆฌฒใใใใ ใใจใใใใใ ใใฉใ้ใใ่กใใใใชใใใใใชใใฎใ"): no characters in this segment found in model dictionary, resorting to original... Failed to align segment ("ใใณใใใใใชใใใปใใจใใซใใชใใใใใใใใใใใใใใชใใงใใใจใใใใใใใใใใใใใใใใซใใใ ใใซใจใใใใซใใ ใใใใฏใใใใใใใชใใใใใใจใใใใใกใใใใใใฉใใใใฎ?ใ่ถณใๅบใพใฃใใใใซใชใฃใฆๅใใชใ"): no characters in this segment found in model dictionary, resorting to original... Failed to align segment ("ไปๆฅใฏ่กใใชใใใฉใๆฌกใซในใฏใผใซใใใๆฅใซใพใใ่ นใ็ใใชใใใฉใใใชใใฆใใใใชใใใใณใใใใใชใใฆใๆฌๅฝใซ็ใใใใใ ่กใใชใใ ใใชใฎใซใใใใช็ไธๅฐฝใชใใจใ่ใใใใชใใฆใใจๆฒใใใชใฃใฆใใใ็ญใใชใใพใพใๆฏใใใ่ฆใฆใใใจใใๆฏใใใ"): no characters in this segment found in model dictionary, resorting to original... Failed to align segment ("ๅฐๆใซใใซใฏใฎๆนฏๆฐใใตใใฃใจๅคงใใไธใใฃใฆใใใใซๆฐด้ณใจใจใใซๆถใใใๆฌๅฝใฏๅพใง้ฃในใใใจๆใฃใฆใใใใฉใ็ญใใๆใใชใใฃใใใใขใฎๅใงใใธใฃใๅงฟใฎใพใพๅใใชใๅฟใ็ก่ฆใใใใใซใใกใใฃใจใฉใใฆใใใจ้ใๆใใใๆฏใใใๅฅฅใฎใชใใณใฐใซๆถใใใ"): no characters in this segment found in model dictionary, resorting to original... Failed to align segment ("ใใใซใฉใใใซ้ป่ฉฑใใๅฃฐใ่ใใใฆใใใใใใใใพใใใๅฎ่ฅฟใงใใใใฉใใใจใใใใพใงใฎไธๆฉๅซใๅฏใใใๆญใฃใใใใชใใใใใใฎๅฃฐใ่ใใใฆใใใใใใใใชใใงใใใ่ นใ็ใใจ่จใๅบใใฆ"): no characters in this segment found in model dictionary, resorting to original... Failed to align segment ("็ณใ่จณใใใพใใใ่ฆๅญฆใฎๆใซใฏใใใฎๅญใฎๆนใ่กใใใใฃใฆไนใๆฐใ ใฃใใใงใใใฉใใฏใใใฏใใๆฌๅฝใซใ่ฟทๆใใใใใฆใใๆฏใใใใณใณใญใ้ฃใใฆ่กใฃใฆใใใในใฏใผใซใฏใใณใณใญใฎๆๅฎคใจใใใจใใใ ใฃใใๅ ฅใๅฃใซๆใใฃใ็ๆฟใฎไธใซใๅญไพ่ฒๆๆฏๆดใ"): no characters in this segment found in model dictionary, resorting to original... Failed to align segment ("ๅฟใฎ็ฌใใใณใจๆผใใใใใฎๅๅใใๅฟใฎๆๅฎคใชใฎใใใชใใ ใ็ณใ่จณใชใใฃใใๅฟใจๅใๅๅใใๆฏใใใ ใฃใฆๆฐใฅใใฆใใใ ใใใใๆฏใใใฏใใใใซ่ชๅใ้ฃใใฆใใใใใซใๅจใซใใฎๅๅใใคใใใใใใใชใใฎใซใ"): no characters in this segment found in model dictionary, resorting to original... Failed to align segment ("่ธใใฎใฅใใจ็ใใ ใไธ็ปๆ กใจๅผใฐใใๅญไพใใๅญฆๆ กใฎไปใซ้ใๅ ดๆใใใใจใใใใจใใใณใณใญใฏ่ชๅใใใใชใฃใฆๅใใฆ็ฅใฃใใๅฐๅญฆๆ กใฎ้ ใใณใณใญใใกใฎใฏใฉในใงๅญฆๆ กใซๆฅใชใๅญใฏไธไบบใใใชใใฃใใ"): no characters in this segment found in model dictionary, resorting to original... Failed to align segment ("ใฟใใชใๅคๅฐใฎใบใซไผใฟใฏ1ๆฅใ2ๆฅใใฆใใใใใใใชใใใฉใใจใซใใใใใใซๆฅใใใใชๅญใฏไธไบบใใใชใใฃใใในใฏใผใซใง่ฟใใฆใใใๅ ็ใใกใใใฟใใช่ชๅใใกใฎๅฟใฎๆๅฎคใในใฏใผใซใจๅผใใงใใใ"): no characters in this segment found in model dictionary, resorting to original... Failed to align segment ("ๅญไพ็ช็ตใฎๆญใฎใๅงใใใฎใใใช้ฐๅฒๆฐใฎไบบใ ใฃใใ่ธใซใคใใใฒใพใใๅใฎๅๆญใซใ่ชฐใๅญไพใๆธใใใใใๅฝผๅฅณใฎไผผ้ก็ตตใจใๅๅณถใใจใใๅๅใๆธใใฆใใใใใฏใใใจ็ญใใๅฃฐใใๆใชใใๅฐใใไธๆ็ญใ ใฃใใ"): no characters in this segment found in model dictionary, resorting to original... Failed to align segment ("ใใใซใ็ฎใใจใฆใๅชใใใๅฅฝๆใๆใฃใใใฉใใใฎไบบใไปใฏใใๅๆฅญใใฆใใใฎๅญฆๆ กใฎไธญๅญฆ็ใงใชใใใจใ้้ใใซ็พจใพใใใฃใใๅฟใฏใ้ชๅ้้ไธญใซ้ใฃใฆใใชใใฆใจใฆใ่จใใชใใใพใ ใๅ ฅๅญฆใใใฐใใใ"): no characters in this segment found in model dictionary, resorting to original... Failed to align segment ("ๅๅใใฎไธก่ฆชใฎใใกใใ็ถใใใฎไผ็คพใฎๆนใ้ๅคใใใฎใซ้ ใใใใใใฎๅๆใๆฉใใๅฟใ่ตทใใ้ ใซใฏใใใใใชใใใจใใปใจใใฉใ ใใใฎใพใพใงใใใจๆใใใใใใใใชใใใใ้ปใฃใใพใพ้ๆฎตใไธใใ่ๅพใใ่ฟฝใ่จใกใฎใใใซใใๆฏใ่ใใใใ"): no characters in this segment found in model dictionary, resorting to original... Traceback (most recent call last): File "/home/ke/.pyenv/versions/subgen/bin/whisperx", line 8, in <module> sys.exit(cli()) File "/home/ke/.pyenv/versions/3.9.9/envs/subgen/lib/python3.9/site-packages/whisperx/transcribe.py", line 723, in cli write_vtt(result_aligned["segments"], file=vtt) File "/home/ke/.pyenv/versions/3.9.9/envs/subgen/lib/python3.9/site-packages/whisperx/utils.py", line 59, in write_vtt f"{format_timestamp(segment['start'])} --> {format_timestamp(segment['end'])}\n" File "/home/ke/.pyenv/versions/3.9.9/envs/subgen/lib/python3.9/site-packages/whisperx/utils.py", line 34, in format_timestamp assert seconds >= 0, "non-negative timestamp expected" AssertionError: non-negative timestamp expected
And it still outputs these vtt & txt files which seem broken (too short, misaligned, missing the whisper outputs, etc): ใใใฟใฎๅญคๅ.zip
I managed to move past this issue for this file using the suggestion here: https://github.com/m-bain/whisperX/issues/84 Use language codes instead of full words like Whisper does. --language ja not --language Japanese
I will verify the original file can complete after I see where #84 lands regarding Failed to align segment
the broken vtt it produces
Getting the same assertion error. Windows 10, latest whisperx installed via pipx but I merged the changes from the existing PR #91 on top. Python 3.9.9, Torch for CUDA 11.7
Command: whisperx --vad_filter --parallel_bs 2 --model medium --language Korean --align_model wav2vec2-xls-r-300m-korean --output_dir "update_test/" korean-convo-lingo.mp3
note I am using VAD here
Align Model: https://huggingface.co/w11wo/wav2vec2-xls-r-300m-korean
Audio Clip: https://www.youtube.com/watch?v=PcysuLjtTeo
yt-dlp https://www.youtube.com/watch?v=PcysuLjtTeo --extract-audio --audio-format mp3 -o "korean-convo-lingo.mp3"
[14:27.295 --> 14:29.795] ์ด ํผ์ ๋ง์์ต๋๋ค.
[14:29.795 --> 14:31.795] ์์ฐ, ์ ๋ง ๋๋์ต๋๋ค.
[14:31.795 --> 14:34.955] ์ ๋ ์ด ํผ์ ์ ์ผ ๋ง์๋ค๊ณ ์๊ฐํฉ๋๋ค.
Performing alignment...
Traceback (most recent call last):
File "C:\Python39\lib\runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Python39\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "c:\users\infinitay\.local\bin\whisperx.exe\__main__.py", line 7, in <module>
File "C:\Users\infinitay\.local\pipx\venvs\whisperx\lib\site-packages\whisperx\transcribe.py", line 739, in cli
write_vtt(result_aligned["segments"], file=vtt)
File "C:\Users\infinitay\.local\pipx\venvs\whisperx\lib\site-packages\whisperx\utils.py", line 59, in write_vtt
f"{format_timestamp(segment['start'])} --> {format_timestamp(segment['end'])}\n"
File "C:\Users\infinitay\.local\pipx\venvs\whisperx\lib\site-packages\whisperx\utils.py", line 34, in format_timestamp
assert seconds >= 0, "non-negative timestamp expected"
AssertionError: non-negative timestamp expected
EDIT: When I was testing other audio clips I realized that the behavior differs depending on the aligh model I am using. With the same audio clip and command linked above, I changed the model to use wav2vec2-large-xls-r-1b-korean-sample5 instead and it ran without any errors.
The same behavior of one model working but another not occurs again for another audio clip of a song. Audio Clip: https://www.youtube.com/watch?v=lxPndeAzfwI Model that failed: wav2vec2-large-xls-r-1b-korean-sample5 (model was linked above) Model that passed: wav2vec2-xls-r-300m-korean (model was linked above)
Found another example of one model working and another resulting in "non-negative timestamp expected" Audio Clip: https://www.youtube.com/watch?v=Z_NaYKUR3sM Model that failed: wav2vec2-large-xls-r-1b-korean-sample5 (model was linked above) Model that passed: wav2vec2-xls-r-300m-korean (model was linked above)
whisper fix and VAD filtering means this cant happen any more in theory :')
I'm not sure how a negative timestamp could have been generated, but I seem to have done it ๐ .
When I run
whisperx "/mnt/d/Editing/Audiobooks/a7/a7.wav" --language Japanese --output_dir "/mnt/d/Editing/Audiobooks/a7/" --model large-v2 --vad_filter --align_model WAV2VEC2_ASR_LARGE_LV60K_960H --hf_token some_token
I get
I'm happy to share the 6 hour file for testing purposes on request. I tried breaking the audio into a small clip, and tested on https://github.com/m-bain/whisperX/issues/84, but unfortunately ended up with other errors.