microsoft / DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
https://www.deepspeed.ai/
Apache License 2.0
33.63k stars 3.95k forks source link

Install errors on Windows #5638

Closed xalteropsx closed 2 weeks ago

xalteropsx commented 2 weeks ago


hint: See above for details.
WARNING: There was an error checking the latest version of pip.

C:\Users\haide>py -m pip install deepspeed==0.14.1
Collecting deepspeed==0.14.1
  Downloading deepspeed-0.14.1.tar.gz (1.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.3/1.3 MB 4.4 MB/s eta 0:00:00
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [28 lines of output]
      [2024-06-11 12:18:43,296] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
      [2024-06-11 12:18:43,630] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
      test.c
      LINK : fatal error LNK1181: cannot open input file 'aio.lib'
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "C:\Users\haide\AppData\Local\Temp\pip-install-raag0ja_\deepspeed_634873663f3f4cd79636ab15bca9392a\setup.py", line 37, in <module>
          from op_builder import get_default_compute_capabilities, OpBuilder
        File "C:\Users\haide\AppData\Local\Temp\pip-install-raag0ja_\deepspeed_634873663f3f4cd79636ab15bca9392a\op_builder\__init__.py", line 18, in <module>
          import deepspeed.ops.op_builder  # noqa: F401 # type: ignore
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "C:\Users\haide\AppData\Local\Temp\pip-install-raag0ja_\deepspeed_634873663f3f4cd79636ab15bca9392a\deepspeed\__init__.py", line 25, in <module>
          from . import ops
        File "C:\Users\haide\AppData\Local\Temp\pip-install-raag0ja_\deepspeed_634873663f3f4cd79636ab15bca9392a\deepspeed\ops\__init__.py", line 15, in <module>
          from ..git_version_info import compatible_ops as __compatible_ops__
        File "C:\Users\haide\AppData\Local\Temp\pip-install-raag0ja_\deepspeed_634873663f3f4cd79636ab15bca9392a\deepspeed\git_version_info.py", line 29, in <module>
          op_compatible = builder.is_compatible()
                          ^^^^^^^^^^^^^^^^^^^^^^^
        File "C:\Users\haide\AppData\Local\Temp\pip-install-raag0ja_\deepspeed_634873663f3f4cd79636ab15bca9392a\op_builder\fp_quantizer.py", line 29, in is_compatible
          sys_cuda_major, _ = installed_cuda_version()
                              ^^^^^^^^^^^^^^^^^^^^^^^^
        File "C:\Users\haide\AppData\Local\Temp\pip-install-raag0ja_\deepspeed_634873663f3f4cd79636ab15bca9392a\op_builder\builder.py", line 50, in installed_cuda_version
          raise MissingCUDAException("CUDA_HOME does not exist, unable to compile CUDA op(s)")
      op_builder.builder.MissingCUDAException: CUDA_HOME does not exist, unable to compile CUDA op(s)
       [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
       [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
       [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
WARNING: There was an error checking the latest version of pip.
loadams commented 2 weeks ago

Hi @xalteropsx - looks like you are trying to install on Windows. Have you followed the Windows batch script here?

CC: @costin-eseanu

xalteropsx commented 2 weeks ago

is this work on rx 7900 xtx ? not tried the window batch script yet

loadams commented 2 weeks ago

is this work on rx 7900 xtx ? not tried the window batch script yet

I'm not sure, we only have officially tested on AMD MI100 and MI200, it may work, it will depend on the ROCm support, but if you need to install on Windows, you'll need to either use WSL to the Windows batch script to ensure the correct ops are built.

xalteropsx commented 2 weeks ago

sorry for late response bro @loadams seems like it not gonna work with window unless wsl but i dont like wsl

image

so triton is still on linux not on window

loadams commented 1 week ago

@xalteropsx - could you share more info on what errors you are hitting when using the updated windows batch script install? That way we can work on fixing that? cc: @costin-eseanu