microsoft / DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
https://www.deepspeed.ai/
Apache License 2.0
34.13k stars 3.99k forks source link

[BUG] pip install DeepSeed Error #3145

Closed ElbekJK closed 1 year ago

ElbekJK commented 1 year ago

When I try to download the DeepSpeed with pip, I am constantly facing with this error. I would appreciate your help!

Collecting deepspeed Using cached deepspeed-0.8.3.tar.gz (765 kB) Preparing metadata (setup.py) ... error error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully. │ exit code: 1 ╰─> [15 lines of output] test.c LINK : fatal error LNK1181: cannot open input file 'aio.lib' Traceback (most recent call last): File "", line 2, in File "", line 34, in File "C:\Users\Elbek\AppData\Local\Temp\pip-install-yduslt_a\deepspeed_eacd64e4be3945869afa52e220c455fa\setup.py", line 156, in abort(f"Unable to pre-compile {op_name}") File "C:\Users\Elbek\AppData\Local\Temp\pip-install-yduslt_a\deepspeed_eacd64e4be3945869afa52e220c455fa\setup.py", line 48, in abort assert False, msg AssertionError: Unable to pre-compile async_io DS_BUILD_OPS=1 [WARNING] async_io requires the dev libaio .so object and headers but these were not found. [WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING] One can disable async_io with DS_BUILD_AIO=0 [ERROR] Unable to pre-compile async_io [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip. error: metadata-generation-failed

× Encountered error while generating package metadata. ╰─> See above for output.

note: This is an issue with the package mentioned above, not pip. hint: See above for details.

satpalsr commented 1 year ago

@ElbekJK Are you on windows? If yes check this

ElbekJK commented 1 year ago

I have tried it. But I am facing this error: DS_BUILD_OPS=1 [WARNING] async_io requires the dev libaio .so object and headers but these were not found. [WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING] One can disable async_io with DS_BUILD_AIO=0 [ERROR] Unable to pre-compile async_io

loadams commented 1 year ago

@ElbekJK - are you building on Windows or Linux? And can you share the command you are using? Since you should be able to pip install deepspeed and then we can use ds_report to see if your system is compatible to run/pre-compile AIO.

ElbekJK commented 1 year ago

I am using Windows. But I cannot create a wheel to request ds_report.

loadams commented 1 year ago

Are you building ops as well in the command, or just running pip install deepspeed? You could also try with running set DS_BUILD_OPS=0 first then running the pip install to not pre-compile ops.

ElbekJK commented 1 year ago

I was building ops. I did what you suggested but it didn't work either. set DS_BUILD_OPS=0 PS C:\Users\Elbek\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\deepspeed-0.8.3> python setup.py bdist_wheel DS_BUILD_OPS=1 test.c LINK : fatal error LNK1181: cannot open input file 'aio.lib' [WARNING] async_io requires the dev libaio .so object and headers but these were not found. [WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING] One can disable async_io with DS_BUILD_AIO=0 [ERROR] Unable to pre-compile async_io

loadams commented 1 year ago

Ah, if you are using PowerShell you'll want to set the environment variable differently, I assumed that was cmd.exe. For PowerShell can you try Set-Item Env:\DS_BUILD_OPS 0

ElbekJK commented 1 year ago

Yeah, it worked. Thanks a lot. But I don't know what to do from here. Because I don't know if the deepspeed is working.

loadams commented 1 year ago

All we did there was disable pre-compiling the ops, they should still be supported via JIT compilation. You should be able to see the status of your DeepSpeed install by running ds_report, can you paste that output here?

Its possible that AIO cannot be installed (the error we got above) and if you do not need that feature, then you can simply disable it.

I'd also recommend using WSL if possible, since not all features are supported on Windows.

GalaxyHe2023 commented 1 year ago

I reinstalled the OS, re-downloaded cuda and cudann, made sure my cuda and cudann are both certified.

set DS_BUILD_AIO=0

set DS_BUILD_SPARSE_ATTN=0

Then the error evolved.

No CUDA runtime is found, using CUDA_HOME='E:\00-chatGPT\CUDA' The system can not find the file specified. The system can not find the file specified. The system can not find the file specified. The system can not find the file specified. fatal: not a git repository (or any of the parent directories): .git Traceback (most recent call last): File "", line 2, in File "", line 34, in File "C:\Users\Administrator\AppData\Local\Temp\pip-install-nqtc3o4j\deepspeed_d3894dbfb2fe4c1984b1cd4a0b06e0f7\setup.py", line 198, in create_dir_symlink('....\csrc', '.\deepspeed\ops\csrc') File "C:\Users\Administrator\AppData\Local\Temp\pip-install-nqtc3o4j\deepspeed_d3894dbfb2fe4c1984b1cd4a0b06e0f7\setup.py", line 190, in create_dir_symlink os. remove(dest) PermissionError: [WinError 5] Access is denied. : '.\deepspeed\ops\csrc'

Are there other solutions?

loadams commented 1 year ago

Hi @GalaxyHe2023 -that looks like a different error here. Are you trying to install DeepSpeed from source or via a published pip package, since I also see this in your logs: fatal: not a git repository (or any of the parent directories): .git.

If it is failing to symlink, make sure you are following these windows directions as well. Though as before, I'd recommend using WSL here if you can as there are more features supported on that.

GalaxyHe2023 commented 1 year ago

I use wsl to fix the issue.

loadams commented 1 year ago

Thanks for clarifying, @GalaxyHe2023, glad that works for you, and it should have more features available as well. @ElbekJK, is WSL something you can use in your setup or not? If not, from where you are when you build without pre-compiling ops, can you paste your output from ds_report?

ElbekJK commented 1 year ago

Yeah, it worked. I am using it on WSL. Thank you a lot for everything!

lsm03624 commented 1 year ago

Ah, if you are using PowerShell you'll want to set the environment variable differently, I assumed that was cmd.exe. For PowerShell can you try Set-Item Env:\DS_BUILD_OPS 0

After setting it up this way, the compilation didn't take place and the directly generated .whl file couldn't be installed.

loadams commented 12 months ago

@lsm03624 - Are you using PowerShell on Windows here? We recommend WSL if possible since it allows use of all features like async_io. But for Powershell, what errors were you seeing for why compilation didn't take place?

ccu1tn commented 9 months ago

@loadams Hello. I also have the same problem. here is my problem. I don't know how to fix it. I built on window with vscode. DeepSpeed: git clone from https://github.com/microsoft/DeepSpeed (speech_env_one) is a virtual environment.

(speech_env_one) D:\Attack\SemCom\SPEECH_TO_TEXT\DeepSC-ST_demonstration\DeepSpeed>python setup.py bdist_wheel DS_BUILD_OPS=1 test.c LINK : fatal error LNK1181: cannot open input file 'aio.lib' [WARNING] async_io requires the dev libaio .so object and headers but these were not found. [WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING] One can disable async_io with DS_BUILD_AIO=0 [ERROR] Unable to pre-compile async_io Traceback (most recent call last): File "setup.py", line 182, in abort(f"Unable to pre-compile {op_name}") File "setup.py", line 52, in abort assert False, msg AssertionError: Unable to pre-compile async_io

(speech_env_one) D:\Attack\SemCom\SPEECH_TO_TEXT\DeepSC-ST_demonstration\DeepSpeed>Set-Item Env:\DS_BUILD_OPS 0 'Set-Item' is not recognized as an internal or external command, operable program or batch file.

loadams commented 9 months ago

Hi @ccu1tn - the failure to set the environment variable seems that you are not in a Powershell environment since the powershell commands are not working currently for you? Can you confirm if you're able to open a new powershell window and run that?

ccu1tn commented 9 months ago

Hi @loadams, thank you for your support. It worked when I ran with PowerShell. The results are like this, Is it correct? image

I also have another error in this picture. TensorFlow: 2.3.1/ Cuda 10.1. python 3.7. I fixed it by re-installing TensorFlow 2.2. It still does not work. image

melMass commented 7 months ago

I just went down the rabbit hole of #4729 #4669 and I'm stuck on this issue specified here and hesitant to open a new one:

image

as text for SEO ```sh ❯ pip install deepspeed --pre Collecting deepspeed Using cached deepspeed-0.12.6.tar.gz (1.2 MB) Preparing metadata (setup.py) ... error error: subprocess-exited-with-error × python setup.py egg_info did not run successfully. │ exit code: 1 ╰─> [36 lines of output] test.c LINK : fatal error LNK1181: cannot open input file 'aio.lib' The system cannot find the file specified. The system cannot find the file specified. The system cannot find the file specified. The system cannot find the file specified. File "", line 2, in File "", line 34, in File "C:\Users\User\AppData\Local\Temp\pip-install-zl23ifx7\deepspeed_74d1c763267644b39987945a8d63c94f\setup.py", line 224, in create_dir_symlink('..\\..\\csrc', '.\\deepspeed\\ops\\csrc') File "C:\Users\User\AppData\Local\Temp\pip-install-zl23ifx7\deepspeed_74d1c763267644b39987945a8d63c94f\setup.py", line 216, in create_dir_symlink os.remove(dest) PermissionError: [WinError 5] Access is denied: '.\\deepspeed\\ops\\csrc' DS_BUILD_OPS=1 [WARNING] async_io requires the dev libaio .so object and headers but these were not found. [WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING] cpu_adam requires the 'lscpu' command, but it does not exist! [WARNING] cpu_adam attempted to query 'lscpu' after failing to use py-cpuinfo to detect the CPU architecture. 'lscpu' does not appear to exist on your system, will fall back to use -march=native and non-vectorized execution. [WARNING] cpu_adam requires the 'lscpu' command, but it does not exist! [WARNING] cpu_adam attempted to query 'lscpu' after failing to use py-cpuinfo to detect the CPU architecture. 'lscpu' does not appear to exist on your system, will fall back to use -march=native and non-vectorized execution. [WARNING] cpu_adagrad requires the 'lscpu' command, but it does not exist! [WARNING] cpu_adagrad attempted to query 'lscpu' after failing to use py-cpuinfo to detect the CPU architecture. 'lscpu' does not appear to exist on your system, will fall back to use -march=native and non-vectorized execution. [WARNING] cpu_adagrad requires the 'lscpu' command, but it does not exist! [WARNING] cpu_adagrad attempted to query 'lscpu' after failing to use py-cpuinfo to detect the CPU architecture. 'lscpu' does not appear to exist on your system, will fall back to use -march=native and non-vectorized execution. [WARNING] cpu_lion requires the 'lscpu' command, but it does not exist! [WARNING] cpu_lion attempted to query 'lscpu' after failing to use py-cpuinfo to detect the CPU architecture. 'lscpu' does not appear to exist on your system, will fall back to use -march=native and non-vectorized execution. [WARNING] cpu_lion requires the 'lscpu' command, but it does not exist! [WARNING] cpu_lion attempted to query 'lscpu' after failing to use py-cpuinfo to detect the CPU architecture. 'lscpu' does not appear to exist on your system, will fall back to use -march=native and non-vectorized execution. [WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH [WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.0 [WARNING] please install triton==1.0.0 if you want to use sparse attention Install Ops={'async_io': False, 'fused_adam': 1, 'cpu_adam': 1, 'cpu_adagrad': 1, 'cpu_lion': 1, 'evoformer_attn': False, 'fused_lamb': 1, 'fused_lion': 1, 'inference_core_ops': 1, 'cutlass_ops': False, 'quantizer': 1, 'ragged_device_ops': False, 'ragged_ops': 1, 'random_ltd': 1, 'sparse_attn': False, 'spatial_inference': 1, 'transformer': 1, 'stochastic_transformer': 1, 'transformer_inference': 1} [end of output] note: This error originates from a subprocess, and is likely not a problem with pip.error: metadata-generation-failed × Encountered error while generating package metadata. ╰─> See above for output. note: This is an issue with the package mentioned above, not pip. hint: See above for details. ```
melMass commented 7 months ago

It works if I clone the repo, I published wheels here for windows

https://github.com/melMass/DeepSpeed/releases/tag/v0.12.7

szriru commented 6 months ago

@melMass Thanks :)

XiangPiIi commented 6 months ago

It works if I clone the repo, I published wheels here for windows

https://github.com/melMass/DeepSpeed/releases/tag/v0.12.7

nice work!

nicotie commented 6 months ago

It works if I clone the repo, I published wheels here for windows

https://github.com/melMass/DeepSpeed/releases/tag/v0.12.7

Thanks!

teanhow commented 3 months ago

It works if I clone the repo, I published wheels here for windows

https://github.com/melMass/DeepSpeed/releases/tag/v0.12.7

Thanks :)

xieliaing commented 1 month ago

It works if I clone the repo, I published wheels here for windows

https://github.com/melMass/DeepSpeed/releases/tag/v0.12.7

Thanks !