myshell-ai / OpenVoice

Instant voice cloning by MIT and MyShell.
https://research.myshell.ai/open-voice
MIT License
29.2k stars 2.86k forks source link

Windows + python3.9 + OpenVoice v2 = not possible without CUDA? #188

Open dusekdan opened 5 months ago

dusekdan commented 5 months ago

Hi,

I followed the windows installation guide, and tried both on latest python 3.12 and 3.9.12 (per recommendation from the guide for python to be 3.9).

When I attempt to run the v2 example from demo_part3.ipynb, I am getting error originating from the following line:

target_se, audio_name = se_extractor.get_se(reference_speaker, tone_color_converter, vad=False)

This is the error:

Traceback (most recent call last):
  File "C:\Users\user\Source\VoiceCloningTests\OpenVoice\demov2_.py", line 23, in <module>
    target_se, audio_name = se_extractor.get_se(reference_speaker, tone_color_converter, vad=False)
  File "C:\Users\user\Source\VoiceCloningTests\OpenVoice\openvoice\se_extractor.py", line 146, in get_se
    wavs_folder = split_audio_whisper(audio_path, target_dir=target_dir, audio_name=audio_name)
  File "C:\Users\user\Source\VoiceCloningTests\OpenVoice\openvoice\se_extractor.py", line 22, in split_audio_whisper
    model = WhisperModel(model_size, device="cuda", compute_type="float16")
  File "C:\Users\user\Source\VoiceCloningTests\OpenVoice\env39\lib\site-packages\faster_whisper\transcribe.py", line 128, in __init__
    self.model = ctranslate2.models.Whisper(
RuntimeError: CUDA failed with error CUDA driver version is insufficient for CUDA runtime version

In the beginning of my script - I follow the demo and setup my device variable in the same way to fallback to cpu. When I open the se_extractor.py file from the error above, I see that a device is hardcoded to be CUDA. And I end up with the error above.

My device is using integrated graphics from Intel, so afaik it is not even CUDA-enabled. Does this mean I cannot run OpenVoice v2, without NVidia graphics?

This is the code from the library - se_extractor.py - with hardcoded cuda string, that raises the issue:

def split_audio_whisper(audio_path, audio_name, target_dir='processed'):
    global model
    if model is None:
        model = WhisperModel(model_size, device="cuda", compute_type="float16")
    # ... 
jicka commented 5 months ago

Hello,

I just ran into the same issue. I replaced the line in question with: model = WhisperModel(model_size, device="cpu", compute_type="float32") And it works. Hope it helps you too !

dusekdan commented 5 months ago

Lol, just came back to say I iterated towards the same solution and got it working. Thanks for the tip.

For all those who will come to this issue looking for the same, here's how I iterated towards the solution:

  1. I looked through the files raising the error. In code above, you can see it comes from WhisperModel.
  2. I located the WhisperModel in my environment (installed by pip into virtualenv) in venv/Lib/site-packages/faster_whisper - I know it's in the faster_whisper module, because the WhisperModel is imported in the very beginning of the example from this module.
  3. Error came from transcribe.py, so that's the file I open and look for class definition

Now there are two parameters of interest there - device and compute_type. When I previously just tried hardcoding cpu for device, I would end up being told float16 is unsupported. So, my line of thinking was that I will look for what other types are supported and try their combinations.

Line 91 in transcribe contains a really long comment, out of which I will take out the most important parts:

"""
Initializes the Whisper model.

        Args:
          [...]
          device: Device to use for computation ("cpu", "cuda", "auto").
          compute_type: Type to use for computation.
            See https://opennmt.net/CTranslate2/quantization.html.
          [...]
"""

Quantization link provides a reference table of implicit type conversions on load, in which I was able to look up what is the default value for CPU for float16, in my architecture (Intel, x64, it is float32).

I changed the corresponding line to hardcode cpu for device variable and float32 for compute_type and got result on the output.

Reference table for your convenience: image

jicka commented 5 months ago

haha you went about it much more professionally than I did :) Happy we both found a solution.

mambari commented 5 months ago

Hello, I have the same error even with the fix : image

hanarotg commented 2 weeks ago

Hello, I have the same error even with the fix : image

Hi @mambari , did you find any solutions about this problem? same error occur to me.