linto-ai / whisper-timestamped

Multilingual Automatic Speech Recognition with word-level timestamps and confidence
GNU Affero General Public License v3.0
2.01k stars 156 forks source link

Inconsistent number of segments: whisper_segments (1352) != timestamped_word_segments (1350) #59

Closed jeremymatt closed 1 year ago

jeremymatt commented 1 year ago

Hi, thanks for making this package publicly available, and thanks in advance for your help!
Unfortunately, I'm still running into this issue: https://github.com/linto-ai/whisper-timestamped/issues/24

It seems to be specific to at least two of the files in the dataset I'm trying to transcribe. Some of the files work fine, but when the script gets to one of two specific files I get the following errors. NOTE: these errors are using the medium.en model - I've added error handling to try successively smaller models and to write the tracebacks to file. I'll update this when I know more (e.g., number of files that have problems, if all models have this problem or just a few, etc). Unfortunately I cannot share the audio due to privacy/research restrictions.

The Python function calls to Whisper:

model_name = 'medium'
model = whisper.load_model(model_name, device="cpu")
result = whisper.transcribe(model, audio, language="en")

Error message for the first file:

rocessing file: D:\github\PAT_data\audio\MOH 110-006 V2 02Dec2020 Part 2.wav
100%|██████████| 233676/233676 [55:47<00:00, 69.80frames/s] 
Inconsistent number of segments: whisper_segments (1352) != timestamped_word_segments (1350)
Traceback (most recent call last):

  File ~\Anaconda3\envs\stt\lib\site-packages\spyder_kernels\py3compat.py:356 in compat_exec
    exec(code, globals, locals)

  File d:\github\pat_private\driver_whisper_speech_to_text.py:56
    result = whisper.transcribe(model, audio, language="en")

  File ~\Anaconda3\envs\stt\lib\site-packages\whisper_timestamped\transcribe.py:259 in transcribe_timestamped
    (transcription, words) = _transcribe_timestamped_efficient(model, audio,

  File ~\Anaconda3\envs\stt\lib\site-packages\whisper_timestamped\transcribe.py:851 in _transcribe_timestamped_efficient
    assert l1 == l2 or l1 == 0, f"Inconsistent number of segments: whisper_segments ({l1}) != timestamped_word_segments ({l2})"

AssertionError: Inconsistent number of segments: whisper_segments (1352) != timestamped_word_segments (1350)

Error message for the second file:

Processing file: D:\github\PAT_data\audio\MOH1 110-004 V5 Part 1 of 2 13NOV2020.wav
100%|██████████| 289939/289939 [57:12<00:00, 84.47frames/s] 
Inconsistent number of segments: whisper_segments (862) != timestamped_word_segments (861)
Traceback (most recent call last):

  File ~\Anaconda3\envs\stt\lib\site-packages\spyder_kernels\py3compat.py:356 in compat_exec
    exec(code, globals, locals)

  File d:\github\pat_private\driver_whisper_speech_to_text.py:56
    result = whisper.transcribe(model, audio, language="en")

  File ~\Anaconda3\envs\stt\lib\site-packages\whisper_timestamped\transcribe.py:259 in transcribe_timestamped
    (transcription, words) = _transcribe_timestamped_efficient(model, audio,

  File ~\Anaconda3\envs\stt\lib\site-packages\whisper_timestamped\transcribe.py:851 in _transcribe_timestamped_efficient
    assert l1 == l2 or l1 == 0, f"Inconsistent number of segments: whisper_segments ({l1}) != timestamped_word_segments ({l2})"

AssertionError: Inconsistent number of segments: whisper_segments (862) != timestamped_word_segments (861)

System and versions: Windows 10 Enterprise (OS build: 19043.2364) Running on a CPU (Intel(R) Xeon(R) CPU E5-2643 v3 @ 3.40GHz 3.40 GHz) Running using Python with Conda as my package manager (installed package versions below) Packages in the environment:

# Name                    Version                   Build  Channel
absl-py                   1.4.0                    pypi_0    pypi
aiohttp                   3.8.4                    pypi_0    pypi
aiosignal                 1.3.1                    pypi_0    pypi
alabaster                 0.7.12             pyhd3eb1b0_0
alembic                   1.10.2                   pypi_0    pypi
antlr4-python3-runtime    4.9.3                    pypi_0    pypi
arrow                     1.2.3           py310haa95532_1
asteroid-filterbanks      0.4.0                    pypi_0    pypi
astroid                   2.14.2          py310haa95532_0
asttokens                 2.0.5              pyhd3eb1b0_0
async-timeout             4.0.2                    pypi_0    pypi
atomicwrites              1.4.0                      py_0
attrs                     22.1.0          py310haa95532_0
audioread                 3.0.0                    pypi_0    pypi
autopep8                  1.6.0              pyhd3eb1b0_1
babel                     2.11.0          py310haa95532_0
backcall                  0.2.0              pyhd3eb1b0_0
backports-cached-property 1.0.2                    pypi_0    pypi
bcrypt                    3.2.0           py310h2bbff1b_1
beautifulsoup4            4.11.1          py310haa95532_0
binaryornot               0.4.4              pyhd3eb1b0_1
black                     22.6.0          py310haa95532_0
blas                      1.0                         mkl
bleach                    4.1.0              pyhd3eb1b0_0
bottleneck                1.3.5           py310h9128911_0
brotlipy                  0.7.0           py310h2bbff1b_1002
bzip2                     1.0.8                he774522_0
ca-certificates           2023.01.10           haa95532_0
cachetools                5.3.0                    pypi_0    pypi
certifi                   2022.12.7       py310haa95532_0
cffi                      1.15.1          py310h2bbff1b_3
chardet                   4.0.0           py310haa95532_1003
charset-normalizer        2.0.4              pyhd3eb1b0_0
click                     8.0.4           py310haa95532_0
cloudpickle               2.0.0              pyhd3eb1b0_0
cmaes                     0.9.1                    pypi_0    pypi
colorama                  0.4.6           py310haa95532_0
coloredlogs               15.0.1                   pypi_0    pypi
colorlog                  6.7.0                    pypi_0    pypi
comm                      0.1.2           py310haa95532_0
commonmark                0.9.1                    pypi_0    pypi
contourpy                 1.0.7                    pypi_0    pypi
cookiecutter              1.7.3              pyhd3eb1b0_0
cryptography              39.0.1          py310h21b164f_0
cycler                    0.11.0                   pypi_0    pypi
cython                    0.29.33                  pypi_0    pypi
debugpy                   1.5.1           py310hd77b12b_0
decorator                 5.1.1              pyhd3eb1b0_0
defusedxml                0.7.1              pyhd3eb1b0_0
diff-match-patch          20200713           pyhd3eb1b0_0
dill                      0.3.6           py310haa95532_0
docopt                    0.6.2                    pypi_0    pypi
docstring-to-markdown     0.11            py310haa95532_0
docutils                  0.18.1          py310haa95532_3
dtw-python                1.3.0                    pypi_0    pypi
einops                    0.3.2                    pypi_0    pypi
entrypoints               0.4             py310haa95532_0
executing                 0.8.3              pyhd3eb1b0_0
ffmpeg                    4.3.1                ha925a31_0    conda-forge
ffmpeg-python             0.2.0                    pypi_0    pypi
filelock                  3.9.0                    pypi_0    pypi
flake8                    6.0.0           py310haa95532_0
flatbuffers               23.3.3                   pypi_0    pypi
flit-core                 3.6.0              pyhd3eb1b0_0
fonttools                 4.39.0                   pypi_0    pypi
frozenlist                1.3.3                    pypi_0    pypi
fsspec                    2023.3.0                 pypi_0    pypi
future                    0.18.3                   pypi_0    pypi
giflib                    5.2.1                h8cc25b3_3
git                       2.34.1               haa95532_0
glib                      2.69.1               h5dc1a3c_2
google-auth               2.16.2                   pypi_0    pypi
google-auth-oauthlib      0.4.6                    pypi_0    pypi
greenlet                  2.0.2                    pypi_0    pypi
grpcio                    1.51.3                   pypi_0    pypi
gst-plugins-base          1.18.5               h9e645db_0
gstreamer                 1.18.5               hd78058f_0
hmmlearn                  0.2.8                    pypi_0    pypi
huggingface-hub           0.13.1                   pypi_0    pypi
humanfriendly             10.0                     pypi_0    pypi
hyperpyyaml               1.1.0                    pypi_0    pypi
icu                       58.2                 ha925a31_3
idna                      3.4             py310haa95532_0
imagesize                 1.4.1           py310haa95532_0
importlib-metadata        4.11.3          py310haa95532_0
importlib_metadata        4.11.3               hd3eb1b0_0
inflection                0.5.1           py310haa95532_0
intel-openmp              2021.4.0          haa95532_3556
intervaltree              3.1.0              pyhd3eb1b0_0
ipykernel                 6.19.2          py310h9909e9c_0
ipython                   8.10.0          py310haa95532_0
ipython_genutils          0.2.0              pyhd3eb1b0_1
isort                     5.9.3              pyhd3eb1b0_0
jedi                      0.18.1          py310haa95532_1
jellyfish                 0.9.0           py310h2bbff1b_0
jinja2                    3.1.2           py310haa95532_0
jinja2-time               0.2.0              pyhd3eb1b0_3
joblib                    1.2.0                    pypi_0    pypi
jpeg                      9e                   h2bbff1b_1
jsonschema                4.17.3          py310haa95532_0
julius                    0.2.7                    pypi_0    pypi
jupyter_client            7.4.9           py310haa95532_0
jupyter_core              5.2.0           py310haa95532_0
jupyterlab_pygments       0.1.2                      py_0
keyring                   23.4.0          py310haa95532_0
kiwisolver                1.4.4                    pypi_0    pypi
lazy-object-proxy         1.6.0           py310h2bbff1b_0
lerc                      3.0                  hd77b12b_0
libclang                  12.0.0          default_h627e005_2
libdeflate                1.17                 h2bbff1b_0
libffi                    3.4.2                hd77b12b_6
libiconv                  1.16                 h2bbff1b_2
libogg                    1.3.5                h2bbff1b_1
libpng                    1.6.39               h8cc25b3_0
librosa                   0.9.2                    pypi_0    pypi
libsodium                 1.0.18               h62dcd97_0
libspatialindex           1.9.3                h6c2663c_0
libtiff                   4.5.0                h6c2663c_2
libvorbis                 1.3.7                he774522_0
libwebp                   1.2.4                hbc33d0d_1
libwebp-base              1.2.4                h2bbff1b_1
libxml2                   2.9.14               h0ad7f3c_0
libxslt                   1.1.35               h2bbff1b_0
llvmlite                  0.39.1                   pypi_0    pypi
lxml                      4.9.1           py310h1985fb9_0
lz4-c                     1.9.4                h2bbff1b_0
mako                      1.2.4                    pypi_0    pypi
markdown                  3.4.1                    pypi_0    pypi
markupsafe                2.1.1           py310h2bbff1b_0
matplotlib                3.7.1                    pypi_0    pypi
matplotlib-inline         0.1.6           py310haa95532_0
mccabe                    0.7.0              pyhd3eb1b0_0
mistune                   0.8.4           py310h2bbff1b_1000
mkl                       2021.4.0           haa95532_640
mkl-service               2.4.0           py310h2bbff1b_0
mkl_fft                   1.3.1           py310ha0764ea_0
mkl_random                1.2.2           py310h4ed8f06_0
more-itertools            9.1.0                    pypi_0    pypi
mpmath                    1.3.0                    pypi_0    pypi
multidict                 6.0.4                    pypi_0    pypi
mypy_extensions           0.4.3           py310haa95532_1
nbclient                  0.5.13          py310haa95532_0
nbconvert                 6.5.4           py310haa95532_0
nbformat                  5.7.0           py310haa95532_0
nest-asyncio              1.5.6           py310haa95532_0
networkx                  2.8.8                    pypi_0    pypi
numba                     0.56.4                   pypi_0    pypi
numexpr                   2.8.4           py310hd213c9f_0
numpy                     1.23.5          py310h60c9a35_0
numpy-base                1.23.5          py310h04254f7_0
numpydoc                  1.5.0           py310haa95532_0
oauthlib                  3.2.2                    pypi_0    pypi
omegaconf                 2.3.0                    pypi_0    pypi
onnxruntime               1.14.1                   pypi_0    pypi
openai-whisper            20230308                 pypi_0    pypi
openssl                   1.1.1t               h2bbff1b_0
optuna                    3.1.0                    pypi_0    pypi
packaging                 22.0            py310haa95532_0
pandas                    1.5.3           py310h4ed8f06_0
pandocfilters             1.5.0              pyhd3eb1b0_0
paramiko                  2.8.1              pyhd3eb1b0_0
parso                     0.8.3              pyhd3eb1b0_0
pathspec                  0.10.3          py310haa95532_0
pcre                      8.45                 hd77b12b_0
pexpect                   4.8.0              pyhd3eb1b0_3
pickleshare               0.7.5           pyhd3eb1b0_1003
pillow                    9.4.0                    pypi_0    pypi
pip                       23.0.1          py310haa95532_0
platformdirs              2.5.2           py310haa95532_0
pluggy                    1.0.0           py310haa95532_1
ply                       3.11            py310haa95532_0
pooch                     1.7.0                    pypi_0    pypi
poyo                      0.5.0              pyhd3eb1b0_0
primepy                   1.3                      pypi_0    pypi
prompt-toolkit            3.0.36          py310haa95532_0
protobuf                  3.20.1                   pypi_0    pypi
psutil                    5.9.0           py310h2bbff1b_0
ptyprocess                0.7.0              pyhd3eb1b0_2
pure_eval                 0.2.2              pyhd3eb1b0_0
pyannote-audio            2.1.1                    pypi_0    pypi
pyannote-core             4.5                      pypi_0    pypi
pyannote-database         4.1.3                    pypi_0    pypi
pyannote-metrics          3.2.1                    pypi_0    pypi
pyannote-pipeline         2.3                      pypi_0    pypi
pyasn1                    0.4.8                    pypi_0    pypi
pyasn1-modules            0.2.8                    pypi_0    pypi
pycodestyle               2.10.0          py310haa95532_0
pycparser                 2.21               pyhd3eb1b0_0
pydeprecate               0.3.2                    pypi_0    pypi
pydocstyle                6.3.0           py310haa95532_0
pydub                     0.25.1             pyhd8ed1ab_0    conda-forge
pyflakes                  3.0.1           py310haa95532_0
pygments                  2.11.2             pyhd3eb1b0_0
pylint                    2.16.2          py310haa95532_0
pylint-venv               2.3.0                    pypi_0    pypi
pyls-spyder               0.4.0              pyhd3eb1b0_0
pynacl                    1.5.0           py310h8cc25b3_0
pyopenssl                 23.0.0          py310haa95532_0
pyparsing                 3.0.9                    pypi_0    pypi
pyqt                      5.15.7          py310hd77b12b_0
pyqt5-sip                 12.11.0         py310hd77b12b_0
pyqtwebengine             5.15.7          py310hd77b12b_0
pyreadline3               3.4.1                    pypi_0    pypi
pyrsistent                0.18.0          py310h2bbff1b_0
pysocks                   1.7.1           py310haa95532_0
python                    3.10.9               h966fe2a_2
python-dateutil           2.8.2              pyhd3eb1b0_0
python-fastjsonschema     2.16.2          py310haa95532_0
python-lsp-black          1.2.1           py310haa95532_0
python-lsp-jsonrpc        1.0.0              pyhd3eb1b0_0
python-lsp-server         1.7.1           py310haa95532_0
python-slugify            5.0.2              pyhd3eb1b0_0
pytoolconfig              1.2.5           py310haa95532_1
pytorch-lightning         1.6.5                    pypi_0    pypi
pytorch-metric-learning   1.7.3                    pypi_0    pypi
pytz                      2022.7          py310haa95532_0
pywin32                   305             py310h2bbff1b_0
pywin32-ctypes            0.2.0           py310haa95532_1000
pyyaml                    6.0             py310h2bbff1b_1
pyzmq                     23.2.0          py310hd77b12b_0
qdarkstyle                3.0.2              pyhd3eb1b0_0
qstylizer                 0.2.2                    pypi_0    pypi
qt-main                   5.15.2               he8e5bd7_7
qt-webengine              5.15.9               hb9a9bb5_5
qtawesome                 1.2.2                    pypi_0    pypi
qtconsole                 5.4.0                    pypi_0    pypi
qtpy                      2.2.0           py310haa95532_0
qtwebkit                  5.212                h3ad3cdb_4
regex                     2022.10.31               pypi_0    pypi
requests                  2.28.1          py310haa95532_0
requests-oauthlib         1.3.1                    pypi_0    pypi
resampy                   0.4.2                    pypi_0    pypi
rich                      12.6.0                   pypi_0    pypi
rope                      1.7.0           py310haa95532_0
rsa                       4.9                      pypi_0    pypi
rtree                     1.0.1           py310h2eaa2aa_0
ruamel-yaml               0.17.21                  pypi_0    pypi
ruamel-yaml-clib          0.2.7                    pypi_0    pypi
scikit-learn              1.2.2                    pypi_0    pypi
scipy                     1.10.1                   pypi_0    pypi
semver                    2.13.0                   pypi_0    pypi
sentencepiece             0.1.97                   pypi_0    pypi
setuptools                65.6.3          py310haa95532_0
shellingham               1.5.0.post1              pypi_0    pypi
simplejson                3.18.3                   pypi_0    pypi
singledispatchmethod      1.0                      pypi_0    pypi
sip                       6.6.2           py310hd77b12b_0
six                       1.16.0             pyhd3eb1b0_1
snowballstemmer           2.2.0              pyhd3eb1b0_0
sortedcontainers          2.4.0              pyhd3eb1b0_0
soundfile                 0.10.3.post1             pypi_0    pypi
soupsieve                 2.3.2.post1     py310haa95532_0
speechbrain               0.5.13                   pypi_0    pypi
sphinx                    5.0.2           py310haa95532_0
sphinxcontrib-applehelp   1.0.2              pyhd3eb1b0_0
sphinxcontrib-devhelp     1.0.2              pyhd3eb1b0_0
sphinxcontrib-htmlhelp    2.0.0              pyhd3eb1b0_0
sphinxcontrib-jsmath      1.0.1              pyhd3eb1b0_0
sphinxcontrib-qthelp      1.0.3              pyhd3eb1b0_0
sphinxcontrib-serializinghtml 1.1.5              pyhd3eb1b0_0
spyder                    5.4.2           py310haa95532_0
spyder-kernels            2.4.2           py310haa95532_0
sqlalchemy                2.0.5.post1              pypi_0    pypi
sqlite                    3.40.1               h2bbff1b_0
stack_data                0.2.0              pyhd3eb1b0_0
sympy                     1.11.1                   pypi_0    pypi
tabulate                  0.9.0                    pypi_0    pypi
tensorboard               2.12.0                   pypi_0    pypi
tensorboard-data-server   0.7.0                    pypi_0    pypi
tensorboard-plugin-wit    1.8.1                    pypi_0    pypi
text-unidecode            1.3                pyhd3eb1b0_0
textdistance              4.2.1              pyhd3eb1b0_0
threadpoolctl             3.1.0                    pypi_0    pypi
three-merge               0.1.1              pyhd3eb1b0_0
tinycss2                  1.2.1           py310haa95532_0
tk                        8.6.12               h2bbff1b_0
tokenizers                0.13.2                   pypi_0    pypi
toml                      0.10.2             pyhd3eb1b0_0
tomli                     2.0.1           py310haa95532_0
tomlkit                   0.11.1          py310haa95532_0
torch                     1.13.1                   pypi_0    pypi
torch-audiomentations     0.11.0                   pypi_0    pypi
torch-pitch-shift         1.2.2                    pypi_0    pypi
torchaudio                0.13.1                   pypi_0    pypi
torchmetrics              0.11.4                   pypi_0    pypi
tornado                   6.2             py310h2bbff1b_0
tqdm                      4.65.0                   pypi_0    pypi
traitlets                 5.7.1           py310haa95532_0
transformers              4.26.1                   pypi_0    pypi
typer                     0.7.0                    pypi_0    pypi
typing-extensions         4.4.0           py310haa95532_0
typing_extensions         4.4.0           py310haa95532_0
tzdata                    2022g                h04d1e81_0
ujson                     5.4.0           py310hd77b12b_0
unidecode                 1.2.0              pyhd3eb1b0_0
urllib3                   1.26.14         py310haa95532_0
vc                        14.2                 h21ff451_1
vs2015_runtime            14.27.29016          h5e58377_2
watchdog                  2.1.6           py310haa95532_0
wcwidth                   0.2.5              pyhd3eb1b0_0
webencodings              0.5.1           py310haa95532_1
werkzeug                  2.2.3                    pypi_0    pypi
whatthepatch              1.0.2           py310haa95532_0
wheel                     0.38.4          py310haa95532_0
whisper-timestamped       1.12.1                   pypi_0    pypi
win_inet_pton             1.1.0           py310haa95532_0
wincertstore              0.2             py310haa95532_2
wrapt                     1.14.1          py310h2bbff1b_0
xz                        5.2.10               h8cc25b3_1
yaml                      0.2.5                he774522_0
yapf                      0.31.0             pyhd3eb1b0_0
yarl                      1.8.2                    pypi_0    pypi
zeromq                    4.3.4                hd77b12b_0
zipp                      3.11.0          py310haa95532_0
zlib                      1.2.13               h8cc25b3_0
zstd                      1.5.2                h19a0ad4_0
jeremymatt commented 1 year ago

For both of those files, the small model worked fine.

Jeronymous commented 1 year ago

Thank you for reporting with accurate information. I am investigating that

Jeronymous commented 1 year ago

It should be fixed now (in version 1.12.3)

jeremymatt commented 1 year ago

Hi, thanks for your quick work on this, but sorry to ask what is probably a dumb question, but how do I get 1.12.3 to install? Is it better to just clone the repository and work from that? I'm using: pip install git+https://github.com/linto-ai/whisper-timestamped

but this just installs 1.12.1 again. I've tried: 1) uninstalling whisper-timestamped and reinstalling 2) clearing the pip cache 3) installing whisper-timestamped in a fresh environment

Jeronymous commented 1 year ago

Oh dear, sorry for the headache, that's my fault : I was working on a temporary branch without noticing, pushed there and not in master. Now it's merged on master, you can try again.

And I think you can do: pip install --upgrade --no-deps --force-reinstall git+https://github.com/linto-ai/whisper-timestamped