matatonic / openedai-speech

An OpenAI API compatible text to speech server using Coqui AI's xtts_v2 and/or piper tts as the backend.
GNU Affero General Public License v3.0
453 stars 58 forks source link

How to use deepspeed for XTTS #59

Open thiswillbeyourgithub opened 1 month ago

thiswillbeyourgithub commented 1 month ago

Hi,

(As per that request) Deepspeed seems to be a library that increases speed for AI related code that support it.

XTTS supports it.

On a non windows computer it seems to be straightforward: just pip install deepspeed then use the appropriate XTTS argument. But the issues seem to arise when we're inside a docker container. There's also an issue with deepspeed causing an increase container size, above the threshold allowed by gchr.io

If you could give pointers to help users try to get deepspeed working on their end it would be awesome! I'm a linux only person. pip install works perfectly outside of docker, but when I tried inside bash of the container I got this error:

pip install deepspeed
Collecting deepspeed
  Using cached deepspeed-0.15.1.tar.gz (1.4 MB)
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [9 lines of output]
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/tmp/pip-install-oefym5jg/deepspeed_51914280e6a94d08ba3b952b5df14105/setup.py", line 108, in <module>
          cuda_major_ver, cuda_minor_ver = installed_cuda_version()
                                           ^^^^^^^^^^^^^^^^^^^^^^^^
        File "/tmp/pip-install-oefym5jg/deepspeed_51914280e6a94d08ba3b952b5df14105/op_builder/builder.py", line 51, in installed_cuda_version
          raise MissingCUDAException("CUDA_HOME does not exist, unable to compile CUDA op(s)")
      op_builder.builder.MissingCUDAException: CUDA_HOME does not exist, unable to compile CUDA op(s)
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
matatonic commented 1 month ago

there are a few complex parts for this.

matatonic commented 1 month ago

PS. you might try an older deepspeed version, like 0.13, IIRC this was more compatible at the time.

matatonic commented 1 month ago

In linux, you need to add the CUDA development toolkit, or switch to using a CUDA dev image (at least), it probably also needs additional dependencies

thiswillbeyourgithub commented 1 month ago

Thanks. But does deepspeed only improve the checkpoint loading time or is it faster overall?

matatonic commented 1 month ago

It should allow running in lower vram with good performance (but probably not better than fully loaded). At it's core, I think it's essentially efficient layer swap space to ram - I'm not really sure, it may do more than that. deepspeed didn't make any difference at all for me when loading xtts in sufficient vram.

thiswillbeyourgithub commented 1 month ago

Alright thank you very much for all this clarification. I've decided then not to spend even more time trying. Fish quantization and piper gpu seem a safer bet for lower latency and better speed tradeoff. As far as I'm concerned you can close this. Thanks again!