rhasspy / rhasspy3

An open source voice assistant toolkit for many human languages
MIT License
311 stars 26 forks source link

Installing Wyoming Whisper without Docker llooks impossible: 404 error when downloading model #16

Open Nardol opened 1 year ago

Nardol commented 1 year ago

I post it here because Python package index website indicate this repository for home page so first of all, sorry if it is the wrong place.

I would like to test Whisper using Wyoming. I use Home Assistant core installation, so I have not Docker for anything. Having Docker installed only for one thing does not look reasonable for me, so I try to install Wyoming Whisper manually.

I looked into the add-on code to see how Wyoming Whisper installed and made the following on my side:

mkdir -p wyoming-whisper/data
cd wyoming-whisper
python3.11 -m venv venv
source venv/bin/activate
pip install wheel
pip install wyoming-faster-whisper==0.0.3
python3 -m wyoming_faster_whisper --uri 'tcp://0.0.0.0:10300' --model medium --beam-size "1" --language "fr" --data-dir ./data --download-dir ./data

But when running the last command, I have the following:

WARNING:wyoming_faster_whisper.download:Model hashes do not match
WARNING:wyoming_faster_whisper.download:Expected: {'config.json': 'e5a2f85afc17f73960204cad2b002633', 'model.bin': '5f852c3335fbd24002ffbb965174e3d7', 'vocabulary.txt': 'c1120a13c94a8cbb132489655cdd1854'}
WARNING:wyoming_faster_whisper.download:Got: {'model.bin': '', 'config.json': '', 'vocabulary.txt': ''}
INFO:__main__:Downloading FasterWhisperModel.MEDIUM to ./data
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/home/pzajda/wyoming-whisper/venv/lib/python3.11/site-packages/wyoming_faster_whisper/__main__.py", line 135, in <module>
    asyncio.run(main())
  File "/home/pzajda/.pyenv/versions/3.11.2/lib/python3.11/asyncio/runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/home/pzajda/.pyenv/versions/3.11.2/lib/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/pzajda/.pyenv/versions/3.11.2/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/home/pzajda/wyoming-whisper/venv/lib/python3.11/site-packages/wyoming_faster_whisper/__main__.py", line 75, in main
    model_dir = download_model(model, args.download_dir)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/pzajda/wyoming-whisper/venv/lib/python3.11/site-packages/wyoming_faster_whisper/download.py", line 90, in download_model
    with urlopen(model_url) as response:
         ^^^^^^^^^^^^^^^^^^
  File "/home/pzajda/.pyenv/versions/3.11.2/lib/python3.11/urllib/request.py", line 216, in urlopen
    return opener.open(url, data, timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/pzajda/.pyenv/versions/3.11.2/lib/python3.11/urllib/request.py", line 525, in open
    response = meth(req, response)
               ^^^^^^^^^^^^^^^^^^^
  File "/home/pzajda/.pyenv/versions/3.11.2/lib/python3.11/urllib/request.py", line 634, in http_response
    response = self.parent.error(
               ^^^^^^^^^^^^^^^^^^
  File "/home/pzajda/.pyenv/versions/3.11.2/lib/python3.11/urllib/request.py", line 563, in error
    return self._call_chain(*args)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/pzajda/.pyenv/versions/3.11.2/lib/python3.11/urllib/request.py", line 496, in _call_chain
    result = func(*args)
             ^^^^^^^^^^^
  File "/home/pzajda/.pyenv/versions/3.11.2/lib/python3.11/urllib/request.py", line 643, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found

How it could work for the add-on but not manually? And how could I solve this? I also posted a topic on Home Assistant community but it looks like I am alone to do that kind of setup :slightly_frowning_face:

synesthesiam commented 1 year ago

The "medium" model is not available here: https://github.com/rhasspy/models/releases/tag/v1.0 It was quite large, so I didn't upload it or the large model.

You can create it yourself by following these steps: https://github.com/guillaumekln/faster-whisper#model-conversion

Nardol commented 1 year ago

Before trying procedure you linked to, I've just tested with tiny-int8 to test with smaller before but with the same 404 result. What do I do wrong? python3 -m wyoming_faster_whisper --uri 'tcp://0.0.0.0:10300' --model tiny-int8 --beam-size "1" --language "fr" --data-dir ./data --download-dir ./data

synesthesiam commented 1 year ago

Weird. Can you get the full URL it's trying for the model?

Nardol commented 1 year ago

I added a print(model_url) before the urlopen which gave me the following: https://github.com/rhasspy/models/releases/download/v1.0/asr_faster-whisper-FasterWhisperModel.TINY_INT8.tar.gz The dash is replaced by a period if using model instead of model.value. Python version 3.11.2 I have not found where the source code is to make a PR.

synesthesiam commented 1 year ago

Ok, so this must be a difference with enums and Python 3.11. Thanks!

antlarr commented 1 year ago

Hi, I had the same problem as @Nardol. @synesthesiam , as you said, this is a difference in Python 3.11 where you now have to replace: model_url = URL_FORMAT.format(model=model) with model_url = URL_FORMAT.format(model=model.value) in download.py (in download_model)

As @Nardol , I also didn't find the source code to make a PR. Could you point us to where it is? Even if you've probably already fixed this, it would be nice to know where it is, just in case someone wants to submit some other fix/improvement.

Thanks!

taha-yassine commented 1 year ago

Hi, I had the same problem as @Nardol. @synesthesiam , as you said, this is a difference in Python 3.11 where you now have to replace: model_url = URL_FORMAT.format(model=model) with model_url = URL_FORMAT.format(model=model.value) in download.py (in download_model)

As @Nardol , I also didn't find the source code to make a PR. Could you point us to where it is? Even if you've probably already fixed this, it would be nice to know where it is, just in case someone wants to submit some other fix/improvement.

Thanks!

I'm answering since no one did. The source code is sitting over in the v0.1.0 branch. It can be found here: https://github.com/rhasspy/rhasspy3/blob/v0.1.0/programs/asr/faster-whisper/script/download.py

mweinelt commented 1 year ago

Ok, so this must be a difference with enums and Python 3.11. Thanks!

That's spot on, tested using the following reproducer. Should be patched to use the value accessor as indicated by @antlarr. The code is on the wyoming-v1 branch.

from enum import Enum

URL_FORMAT = "https://github.com/rhasspy/models/releases/download/v1.0/asr_faster-whisper-{model}.tar.gz"

class FasterWhisperModel(str, Enum):
    """Available faster-whisper models."""

    TINY = "tiny"
    TINY_INT8 = "tiny-int8"
    BASE = "base"
    BASE_INT8 = "base-int8"
    SMALL = "small"
    SMALL_INT8 = "small-int8"
    MEDIUM = "medium"
    MEDIUM_INT8 = "medium-int8"

tiny = FasterWhisperModel.TINY

print(URL_FORMAT.format(model=tiny))
print(URL_FORMAT.format(model=tiny.value))
$ python3.8 enumtest.py
https://github.com/rhasspy/models/releases/download/v1.0/asr_faster-whisper-tiny.tar.gz
https://github.com/rhasspy/models/releases/download/v1.0/asr_faster-whisper-tiny.tar.gz
$ python3.9 enumtest.py
https://github.com/rhasspy/models/releases/download/v1.0/asr_faster-whisper-tiny.tar.gz
https://github.com/rhasspy/models/releases/download/v1.0/asr_faster-whisper-tiny.tar.gz
$ python3.10 enumtest.py
https://github.com/rhasspy/models/releases/download/v1.0/asr_faster-whisper-tiny.tar.gz
https://github.com/rhasspy/models/releases/download/v1.0/asr_faster-whisper-tiny.tar.gz
$ python3.11 enumtest.py
https://github.com/rhasspy/models/releases/download/v1.0/asr_faster-whisper-FasterWhisperModel.TINY.tar.gz
https://github.com/rhasspy/models/releases/download/v1.0/asr_faster-whisper-tiny.tar.gz
$ python3.12 enumtest.py
https://github.com/rhasspy/models/releases/download/v1.0/asr_faster-whisper-FasterWhisperModel.TINY.tar.gz
https://github.com/rhasspy/models/releases/download/v1.0/asr_faster-whisper-tiny.tar.gz