How to use deepspeed for XTTS

thiswillbeyourgithub commented 1 month ago

Hi,

(As per that request) Deepspeed seems to be a library that increases speed for AI related code that support it.

XTTS supports it.

On a non windows computer it seems to be straightforward: just pip install deepspeed then use the appropriate XTTS argument. But the issues seem to arise when we're inside a docker container. There's also an issue with deepspeed causing an increase container size, above the threshold allowed by gchr.io

If you could give pointers to help users try to get deepspeed working on their end it would be awesome! I'm a linux only person. pip install works perfectly outside of docker, but when I tried inside bash of the container I got this error:

pip install deepspeed
Collecting deepspeed
  Using cached deepspeed-0.15.1.tar.gz (1.4 MB)
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [9 lines of output]
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/tmp/pip-install-oefym5jg/deepspeed_51914280e6a94d08ba3b952b5df14105/setup.py", line 108, in <module>
          cuda_major_ver, cuda_minor_ver = installed_cuda_version()
                                           ^^^^^^^^^^^^^^^^^^^^^^^^
        File "/tmp/pip-install-oefym5jg/deepspeed_51914280e6a94d08ba3b952b5df14105/op_builder/builder.py", line 51, in installed_cuda_version
          raise MissingCUDAException("CUDA_HOME does not exist, unable to compile CUDA op(s)")
      op_builder.builder.MissingCUDAException: CUDA_HOME does not exist, unable to compile CUDA op(s)
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

matatonic commented 1 month ago

there are a few complex parts for this.

there are no pre-built wheels that I could find
it needs to be compiled, and needs the CUDA dev kit available, this makes the image about 6GB larger if done simply, and this is too big for ghcr.io.
setting up a dev environment with the right CUDA is beyond what I will support. If you can do this on your own, great. So the deepspeed option is available in the server, if you get it installed.
I can't test anything in windows, so windows users are on their own to sort this out so far, a PR, or simple docs may be ok though.

matatonic commented 1 month ago

PS. you might try an older deepspeed version, like 0.13, IIRC this was more compatible at the time.

matatonic commented 1 month ago

In linux, you need to add the CUDA development toolkit, or switch to using a CUDA dev image (at least), it probably also needs additional dependencies

thiswillbeyourgithub commented 1 month ago

Thanks. But does deepspeed only improve the checkpoint loading time or is it faster overall?

matatonic commented 1 month ago

It should allow running in lower vram with good performance (but probably not better than fully loaded). At it's core, I think it's essentially efficient layer swap space to ram - I'm not really sure, it may do more than that. deepspeed didn't make any difference at all for me when loading xtts in sufficient vram.

thiswillbeyourgithub commented 1 month ago

Alright thank you very much for all this clarification. I've decided then not to spend even more time trying. Fish quantization and piper gpu seem a safer bet for lower latency and better speed tradeoff. As far as I'm concerned you can close this. Thanks again!

matatonic / openedai-speech

How to use deepspeed for XTTS #59