lucataco / cog-whisperspeech

Cog wrapper for collabora/WhisperSpeech
https://replicate.com/lucataco/whisperspeech-small
23 stars 4 forks source link

Download model weights from Replicate cache, accept MP3 and WAV inputs #3

Closed jd7h closed 7 months ago

jd7h commented 7 months ago

Results of reviewing whisper-speech:

lucataco commented 7 months ago

I kept getting errors about missing torchvision and

ModuleNotFoundError: No module named 'speechbrain.pretrained'

So I reverted this merge

jd7h commented 6 months ago

I'm sorry to hear that! I'll have a look with a clean git clone to figure out what happened.

jd7h commented 6 months ago

I can't reproduce the errors on my dev machine, even with a clean git clone. Can you elaborate on what happened when you tested it?

My steps:

git clone git@github.com:lucataco/cog-whisperspeech.git
git reset --hard 3f1462282255401462b8d0d1845a44f4ecd624a5 # where my PR was merged
cog build
cog predict

Output:

14:26 judith@datakami-dev-a40:~/whisper-speech/cog-whisperspeech$ cog predict
Building Docker image from environment in cog.yaml...
[+] Building 1.2s (22/22) FINISHED                                                                                                                                                   
 => [internal] load .dockerignore                                                                                                                                               0.1s
 => => transferring context: 367B                                                                                                                                               0.0s
 => [internal] load build definition from Dockerfile                                                                                                                            0.1s
 => => transferring dockerfile: 2.35kB                                                                                                                                          0.0s
 => resolve image config for docker.io/docker/dockerfile:1.4                                                                                                                    0.2s
 => CACHED docker-image://docker.io/docker/dockerfile:1.4@sha256:9ba7531bd80fb0a858632727cf7a112fbfd19b17e94c4e84ced81e24ef1a0dbc                                               0.0s
 => [internal] load metadata for docker.io/nvidia/cuda:11.8.0-cudnn8-devel-ubuntu22.04                                                                                          0.2s
 => [internal] load metadata for docker.io/library/python:3.11                                                                                                                  0.3s
 => [deps 1/5] FROM docker.io/library/python:3.11@sha256:4f7a334f9b8941fc7779e17541eaa0fd6043bdb63de1f5b0ee634e7991706e63                                                       0.1s
 => => resolve docker.io/library/python:3.11@sha256:4f7a334f9b8941fc7779e17541eaa0fd6043bdb63de1f5b0ee634e7991706e63                                                            0.1s
 => [internal] load build context                                                                                                                                               0.0s
 => => transferring context: 90.74kB                                                                                                                                            0.0s
 => [stage-1 1/8] FROM docker.io/nvidia/cuda:11.8.0-cudnn8-devel-ubuntu22.04@sha256:8f9dd0d09d3ad3900357a1cf7f887888b5b74056636cd6ef03c160c3cd4b1d95                            0.0s
 => CACHED [stage-1 2/8] RUN --mount=type=cache,target=/var/cache/apt set -eux; apt-get update -qq; apt-get install -qqy --no-install-recommends curl; rm -rf /var/lib/apt/lis  0.0s
 => CACHED [stage-1 3/8] RUN --mount=type=cache,target=/var/cache/apt apt-get update -qq && apt-get install -qqy --no-install-recommends  make  build-essential  libssl-dev  z  0.0s
 => CACHED [stage-1 4/8] RUN curl -s -S -L https://raw.githubusercontent.com/pyenv/pyenv-installer/master/bin/pyenv-installer | bash &&  git clone https://github.com/momo-lab  0.0s
 => CACHED [stage-1 5/8] RUN --mount=type=cache,target=/var/cache/apt apt-get update -qq && apt-get install -qqy ffmpeg && rm -rf /var/lib/apt/lists/*                          0.0s
 => CACHED [deps 2/5] COPY .cog/tmp/build4235445269/cog-0.0.1.dev-py3-none-any.whl /tmp/cog-0.0.1.dev-py3-none-any.whl                                                          0.0s
 => CACHED [deps 3/5] RUN --mount=type=cache,target=/root/.cache/pip pip install -t /dep /tmp/cog-0.0.1.dev-py3-none-any.whl                                                    0.0s
 => CACHED [deps 4/5] COPY .cog/tmp/build4235445269/requirements.txt /tmp/requirements.txt                                                                                      0.0s
 => CACHED [deps 5/5] RUN --mount=type=cache,target=/root/.cache/pip pip install -t /dep -r /tmp/requirements.txt                                                               0.0s
 => CACHED [stage-1 6/8] RUN --mount=type=bind,from=deps,source=/dep,target=/dep cp -rf /dep/* $(pyenv prefix)/lib/python*/site-packages || true                                0.0s
 => CACHED [stage-1 7/8] RUN curl -o /usr/local/bin/pget -L "https://github.com/replicate/pget/releases/download/v0.5.6/pget_linux_x86_64" && chmod +x /usr/local/bin/pget      0.0s
 => CACHED [stage-1 8/8] WORKDIR /src                                                                                                                                           0.0s
 => preparing layers for inline cache                                                                                                                                           0.0s
 => exporting to image                                                                                                                                                          0.0s
 => => exporting layers                                                                                                                                                         0.0s
 => => writing image sha256:8de91a4f1349e27053b9fc1b233ec748e54adb338ea7e941cee6d37ec6dcc7dc                                                                                    0.0s
 => => naming to docker.io/library/cog-whisperspeech-base                                                                                                                       0.0s

Starting Docker image cog-whisperspeech-base and running setup()...
{"logger": "speechbrain.utils.train_logger", "timestamp": "2024-03-04T14:26:22.455168Z", "severity": "WARNING", "message": "torchvision is not available - cannot save figures"}
torchvision is not available - cannot save figures
downloading url:  https://weights.replicate.delivery/default/whisper-speech/models.tar
downloading to:  /src/models/
2024-03-04T14:26:25Z | INFO  | [ Initiating ] dest=/src/models/ minimum_chunk_size=16M url=https://weights.replicate.delivery/default/whisper-speech/models.tar
2024-03-04T14:26:25Z | INFO  | [ Redirect ] redirect_url=https://storage.googleapis.com/replicate-weights/whisper-speech/models.tar url=https://weights.replicate.delivery/default/whisper-speech/models.tar
2024-03-04T14:26:30Z | INFO  | [ Complete ] dest=/src/models/ size="2.0 GB" total_elapsed=5.443s url=https://weights.replicate.delivery/default/whisper-speech/models.tar
downloading took:  5.571341037750244
Running prediction...
█
|----------------------------------------| 0.00% [0/749 00:00<?]
|----------------------------------------| 0.13% [1/749 00:00<00:23]
|----------------------------------------| 0.27% [2/749 00:00<00:18]
|----------------------------------------| 0.40% [3/749 00:00<00:16]
|----------------------------------------| 0.53% [4/749 00:00<00:15]
|----------------------------------------| 0.67% [5/749 00:00<00:14]
|----------------------------------------| 1.87% [14/749 00:00<00:13]
|█---------------------------------------| 3.20% [24/749 00:00<00:13]
|█---------------------------------------| 4.67% [35/749 00:00<00:12]
|██--------------------------------------| 6.14% [46/749 00:00<00:12]
|███-------------------------------------| 7.61% [57/749 00:00<00:11]
|███-------------------------------------| 9.08% [68/749 00:01<00:11]
|████------------------------------------| 10.55% [79/749 00:01<00:11]
|████------------------------------------| 12.02% [90/749 00:01<00:11]
|█████-----------------------------------| 13.48% [101/749 00:01<00:10]
|█████-----------------------------------| 14.95% [112/749 00:01<00:10]
|██████----------------------------------| 16.42% [123/749 00:02<00:10]
|███████---------------------------------| 17.89% [134/749 00:02<00:10]
|███████---------------------------------| 19.36% [145/749 00:02<00:10]
|████████--------------------------------| 20.83% [156/749 00:02<00:10]
|████████--------------------------------| 22.30% [167/749 00:02<00:09]
|█████████-------------------------------| 23.77% [178/749 00:02<00:09]
|██████████------------------------------| 25.23% [189/749 00:03<00:09]
|██████████------------------------------| 26.70% [200/749 00:03<00:09]
|███████████-----------------------------| 28.17% [211/749 00:03<00:09]
|███████████-----------------------------| 29.64% [222/749 00:03<00:08]
|████████████----------------------------| 31.11% [233/749 00:03<00:08]
|█████████████---------------------------| 32.58% [244/749 00:04<00:08]
█
|----------------------------------------| 0.00% [0/752 00:00<?]
|----------------------------------------| 0.13% [1/752 00:00<00:21]
|----------------------------------------| 0.27% [2/752 00:00<00:21]
|----------------------------------------| 0.40% [3/752 00:00<00:21]
|----------------------------------------| 0.53% [4/752 00:00<00:21]
|----------------------------------------| 0.66% [5/752 00:00<00:21]
|----------------------------------------| 1.46% [11/752 00:00<00:21]
|----------------------------------------| 2.26% [17/752 00:00<00:21]
|█---------------------------------------| 3.06% [23/752 00:00<00:20]
|█---------------------------------------| 3.86% [29/752 00:00<00:20]
|█---------------------------------------| 4.65% [35/752 00:01<00:20]
|██--------------------------------------| 5.45% [41/752 00:01<00:20]
|██--------------------------------------| 6.25% [47/752 00:01<00:20]
|██--------------------------------------| 7.05% [53/752 00:01<00:20]
|███-------------------------------------| 7.85% [59/752 00:01<00:19]
|███-------------------------------------| 8.64% [65/752 00:01<00:19]
|███-------------------------------------| 9.44% [71/752 00:02<00:19]
|████------------------------------------| 10.24% [77/752 00:02<00:19]
|████------------------------------------| 11.04% [83/752 00:02<00:19]
|████------------------------------------| 11.84% [89/752 00:02<00:18]
|█████-----------------------------------| 12.63% [95/752 00:02<00:18]
|█████-----------------------------------| 13.43% [101/752 00:02<00:18]
|█████-----------------------------------| 14.23% [107/752 00:03<00:18]
|██████----------------------------------| 15.03% [113/752 00:03<00:18]
|██████----------------------------------| 15.82% [119/752 00:03<00:18]
|██████----------------------------------| 16.62% [125/752 00:03<00:17]
|██████----------------------------------| 17.42% [131/752 00:03<00:17]
|███████---------------------------------| 18.22% [137/752 00:03<00:17]
|███████---------------------------------| 19.02% [143/752 00:04<00:17]
|███████---------------------------------| 19.81% [149/752 00:04<00:17]
|████████--------------------------------| 20.61% [155/752 00:04<00:17]
|████████--------------------------------| 21.41% [161/752 00:04<00:16]
|████████--------------------------------| 22.21% [167/752 00:04<00:16]
|█████████-------------------------------| 23.01% [173/752 00:04<00:16]
|█████████-------------------------------| 23.80% [179/752 00:05<00:16]
|█████████-------------------------------| 24.60% [185/752 00:05<00:16]
|██████████------------------------------| 25.40% [191/752 00:05<00:16]
|██████████------------------------------| 26.20% [197/752 00:05<00:15]
|██████████------------------------------| 27.13% [204/752 00:05<00:15]
|███████████-----------------------------| 28.06% [211/752 00:06<00:15]
|███████████-----------------------------| 28.99% [218/752 00:06<00:15]
|███████████-----------------------------| 29.92% [225/752 00:06<00:14]
|████████████----------------------------| 30.85% [232/752 00:06<00:14]
|████████████----------------------------| 31.78% [239/752 00:06<00:14]
|█████████████---------------------------| 32.71% [246/752 00:06<00:14]
|█████████████---------------------------| 33.64% [253/752 00:07<00:14]
|█████████████---------------------------| 34.57% [260/752 00:07<00:13]
|██████████████--------------------------| 35.51% [267/752 00:07<00:13]
|██████████████--------------------------| 36.44% [274/752 00:07<00:13]
|██████████████--------------------------| 37.37% [281/752 00:07<00:13]
|███████████████-------------------------| 38.30% [288/752 00:08<00:13]
|███████████████-------------------------| 39.23% [295/752 00:08<00:12]
|████████████████------------------------| 40.16% [302/752 00:08<00:12]
|████████████████------------------------| 41.09% [309/752 00:08<00:12]
|████████████████------------------------| 42.02% [316/752 00:08<00:12]
|█████████████████-----------------------| 42.95% [323/752 00:09<00:12]
|█████████████████-----------------------| 43.88% [330/752 00:09<00:12]
|█████████████████-----------------------| 44.81% [337/752 00:09<00:11]
|██████████████████----------------------| 45.74% [344/752 00:09<00:11]
|██████████████████----------------------| 46.68% [351/752 00:09<00:11]
|███████████████████---------------------| 47.61% [358/752 00:10<00:11]
|███████████████████---------------------| 48.54% [365/752 00:10<00:11]
|███████████████████---------------------| 49.47% [372/752 00:10<00:10]
|████████████████████--------------------| 50.40% [379/752 00:10<00:10]
|████████████████████--------------------| 51.33% [386/752 00:10<00:10]
|████████████████████--------------------| 52.26% [393/752 00:11<00:10]
|█████████████████████-------------------| 53.19% [400/752 00:11<00:10]
|█████████████████████-------------------| 54.12% [407/752 00:11<00:09]
|██████████████████████------------------| 55.05% [414/752 00:11<00:09]
|██████████████████████------------------| 55.98% [421/752 00:12<00:09]
|██████████████████████------------------| 56.91% [428/752 00:12<00:09]
|███████████████████████-----------------| 57.85% [435/752 00:12<00:09]
|███████████████████████-----------------| 58.78% [442/752 00:12<00:08]
|███████████████████████-----------------| 59.71% [449/752 00:12<00:08]
|████████████████████████----------------| 60.64% [456/752 00:12<00:08]
|████████████████████████----------------| 61.57% [463/752 00:13<00:08]
|█████████████████████████---------------| 62.50% [470/752 00:13<00:08]
|█████████████████████████---------------| 63.43% [477/752 00:13<00:07]
|█████████████████████████---------------| 64.36% [484/752 00:13<00:07]
|██████████████████████████--------------| 65.29% [491/752 00:13<00:07]
|██████████████████████████--------------| 66.22% [498/752 00:14<00:07]
|██████████████████████████--------------| 67.15% [505/752 00:14<00:07]
|███████████████████████████-------------| 68.09% [512/752 00:14<00:06]
|███████████████████████████-------------| 69.02% [519/752 00:14<00:06]
|███████████████████████████-------------| 69.95% [526/752 00:14<00:06]
|████████████████████████████------------| 70.88% [533/752 00:15<00:06]
|████████████████████████████------------| 71.81% [540/752 00:15<00:06]
|█████████████████████████████-----------| 72.74% [547/752 00:15<00:05]
|█████████████████████████████-----------| 73.67% [554/752 00:15<00:05]
|█████████████████████████████-----------| 74.60% [561/752 00:15<00:05]
|██████████████████████████████----------| 75.53% [568/752 00:16<00:05]
|██████████████████████████████----------| 76.46% [575/752 00:16<00:05]
|██████████████████████████████----------| 77.39% [582/752 00:16<00:04]
|███████████████████████████████---------| 78.32% [589/752 00:16<00:04]
|███████████████████████████████---------| 79.26% [596/752 00:16<00:04]
|████████████████████████████████--------| 80.19% [603/752 00:17<00:04]
|████████████████████████████████--------| 81.12% [610/752 00:17<00:04]
|████████████████████████████████--------| 82.05% [617/752 00:17<00:03]
|█████████████████████████████████-------| 82.98% [624/752 00:17<00:03]
|█████████████████████████████████-------| 83.91% [631/752 00:18<00:03]
|█████████████████████████████████-------| 84.84% [638/752 00:18<00:03]
|██████████████████████████████████------| 85.77% [645/752 00:18<00:03]
|██████████████████████████████████------| 86.70% [652/752 00:18<00:02]
|███████████████████████████████████-----| 87.63% [659/752 00:18<00:02]
|███████████████████████████████████-----| 88.56% [666/752 00:19<00:02]
|███████████████████████████████████-----| 89.36% [672/752 00:19<00:02]
|████████████████████████████████████----| 90.16% [678/752 00:19<00:02]
|████████████████████████████████████----| 90.96% [684/752 00:19<00:01]
|████████████████████████████████████----| 91.76% [690/752 00:19<00:01]
|█████████████████████████████████████---| 92.55% [696/752 00:19<00:01]
|█████████████████████████████████████---| 93.35% [702/752 00:20<00:01]
|█████████████████████████████████████---| 94.15% [708/752 00:20<00:01]
|█████████████████████████████████████---| 94.95% [714/752 00:20<00:01]
|██████████████████████████████████████--| 95.74% [720/752 00:20<00:00]
|██████████████████████████████████████--| 96.54% [726/752 00:20<00:00]
|██████████████████████████████████████--| 97.47% [733/752 00:20<00:00]
|███████████████████████████████████████-| 98.40% [740/752 00:21<00:00]
|███████████████████████████████████████-| 99.34% [747/752 00:21<00:00]
|████████████████████████████████████████| 100.00% [752/752 00:21<00:00]
Written output to output.wav

Output.wav: https://github.com/lucataco/cog-whisperspeech/assets/690008/78e0cc14-dcd1-4c50-b0c7-19549987afd1

lucataco commented 6 months ago

Sure thing Following your steps, my output from: cog build

Building Docker image from environment in cog.yaml as cog-whisperspeech... [+] Building 1.9s (23/23) FINISHED docker:default => [internal] load build definition from Dockerfile 0.3s => => transferring dockerfile: 2.36kB 0.0s => [internal] load .dockerignore 0.3s => => transferring context: 367B 0.0s => resolve image config for docker.io/docker/dockerfile:1.4 0.3s => CACHED docker-image://docker.io/docker/dockerfile:1.4@sha256:9ba7531bd80fb0a858632727cf7a112fbfd19b17e94c4e84ced81e24ef1a0dbc 0.0s => [internal] load metadata for docker.io/nvidia/cuda:11.8.0-cudnn8-devel-ubuntu22.04 0.0s => [internal] load metadata for docker.io/library/python:3.11 0.0s => [internal] load build context 0.1s => => transferring context: 95.05kB 0.0s => [deps 1/5] FROM docker.io/library/python:3.11 0.0s => [stage-1 1/9] FROM docker.io/nvidia/cuda:11.8.0-cudnn8-devel-ubuntu22.04 0.0s => CACHED [stage-1 2/9] RUN --mount=type=cache,target=/var/cache/apt set -eux; apt-get update -qq; apt-get install -qqy --no-install-recommends cu 0.0s => CACHED [stage-1 3/9] RUN --mount=type=cache,target=/var/cache/apt apt-get update -qq && apt-get install -qqy --no-install-recommends make bui 0.0s => CACHED [stage-1 4/9] RUN curl -s -S -L https://raw.githubusercontent.com/pyenv/pyenv-installer/master/bin/pyenv-installer | bash && git clone 0.0s => CACHED [stage-1 5/9] RUN --mount=type=cache,target=/var/cache/apt apt-get update -qq && apt-get install -qqy ffmpeg && rm -rf /var/lib/apt/list 0.0s => CACHED [deps 2/5] COPY .cog/tmp/build1863467073/cog-0.0.1.dev-py3-none-any.whl /tmp/cog-0.0.1.dev-py3-none-any.whl 0.0s => CACHED [deps 3/5] RUN --mount=type=cache,target=/root/.cache/pip pip install -t /dep /tmp/cog-0.0.1.dev-py3-none-any.whl 0.0s => CACHED [deps 4/5] COPY .cog/tmp/build1863467073/requirements.txt /tmp/requirements.txt 0.0s => CACHED [deps 5/5] RUN --mount=type=cache,target=/root/.cache/pip pip install -t /dep -r /tmp/requirements.txt 0.0s => CACHED [stage-1 6/9] RUN --mount=type=bind,from=deps,source=/dep,target=/dep cp -rf /dep/ $(pyenv prefix)/lib/python/site-packages || true 0.0s => CACHED [stage-1 7/9] RUN curl -o /usr/local/bin/pget -L "https://github.com/replicate/pget/releases/download/v0.5.6/pget_linux_x86_64" && chmod 0.0s => CACHED [stage-1 8/9] WORKDIR /src 0.0s => [stage-1 9/9] COPY . /src 0.2s => preparing layers for inline cache 0.2s => exporting to image 0.0s => => exporting layers 0.0s => => writing image sha256:d72b0a7fe3e77273529c7a292d6ecf925fecd739f2f2cfe301cf88683d71e95c 0.0s => => naming to docker.io/library/cog-whisperspeech 0.0s Validating model schema...

Traceback (most recent call last): File "", line 198, in _run_module_as_main File "", line 88, in _run_code File "/root/.pyenv/versions/3.11.8/lib/python3.11/site-packages/cog/command/openapi_schema.py", line 21, in raise CogError(app.state.setup_result.logs) cog.errors.CogError: Error while loading predictor:

Traceback (most recent call last): File "/root/.pyenv/versions/3.11.8/lib/python3.11/site-packages/cog/server/http.py", line 129, in create_app predictor = load_predictor_from_ref(predictor_ref) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/.pyenv/versions/3.11.8/lib/python3.11/site-packages/cog/predictor.py", line 184, in load_predictor_from_ref spec.loader.exec_module(module) File "", line 940, in exec_module File "", line 241, in _call_with_frames_removed File "/src/predict.py", line 16, in from speechbrain.pretrained import EncoderClassifier ModuleNotFoundError: No module named 'speechbrain.pretrained'

jd7h commented 6 months ago

Since the cog build is failing, I suspect we run different versions of cog.

$ cog --version
cog version 0.9.4 (built 2024-01-24T22:16:49Z)
lucataco commented 6 months ago

Im also on 0.9.4

jd7h commented 6 months ago

I've managed to reproduce the error with cog build --no-cache. I think one of the dependencies of package WhisperSpeech (in cog.yaml) has changed on PyPi between the moment of development and the merge action. :grin:

lucataco commented 6 months ago

The repo builds fine without your changes, so I dont think its the python packages. I noticed that you added the "models" folder to the .gitignore file. Maybe you were actually caching that folder on your dev machine?

jd7h commented 6 months ago

Speechbrain did a major release recently (1.0.0). If I pin Speechbrain to the version I had when I was developing, the cog build terminates correctly. WhisperSpeech didn't pin their dependencies, so their requirements pointed to a broken version of Speechbrain.

jd7h commented 6 months ago

Thanks for supplying your outputs to help debug this. :)