microsoft / DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
https://www.deepspeed.ai/
Apache License 2.0
34.56k stars 4.03k forks source link

[BUG] AssertionError: Unable to pre-compile ops without torch installed. #3329

Open JerryAllison opened 1 year ago

JerryAllison commented 1 year ago

Issue with installing DeepSpeed, "pip install deepspeed" resulted in the following error:

image

System info (please complete the following information):

pls help me.

JerryAllison commented 1 year ago

add: image My PyTorch can be imported normally, but when I try to pip install DeepSpeed, it always prompts that it cannot import torch.

mrwyattii commented 1 year ago

@JerryAllison I suspect you are using pip>=23.1? A recent change in pip makes it so the default behavior is to build in an isolated environment. This means DeepSpeed will not find torch. You can fix this by doing pip install . --no-build-isolation. We will update our docs to reflect this.

JerryAllison commented 1 year ago

thank, But there were other warnings appearing.

image

AngelTs commented 1 year ago

pip install . --no-build-isolation

Same error here

KbKev78 commented 1 year ago

Can confirm. I had the same torch error, then used --no-build-isolation to get past it. The deepspeed install then couldn't find libaio, so I installed that. Now I get the same error as JerryAllison.

For me, pip -V returns "pip 23.1.1"

mrwyattii commented 1 year ago

@JerryAllison It looks like you are trying to install on Windows? It can be a little tricky to get DeepSpeed installed on Windows (but it is possible). We highly recommend using WSL and installing DeepSpeed in that environment.

However, if you don't want to use WSL: The error you are seeing now is related to libaio not being available for Windows. You must disable pre-compilation of these features with set DS_BUILD_AIO=0.

mrwyattii commented 1 year ago

@AngelTs and @KbKev78 can you please provide some additional information about your environments? Are you also trying to install on Windows? Thanks

KbKev78 commented 1 year ago

@JerryAllison It looks like you are trying to install on Windows? It can be a little tricky to get DeepSpeed installed on Windows (but it is possible). We highly recommend using WSL and installing DeepSpeed in that environment.

However, if you don't want to use WSL: The error you are seeing now is related to libaio not being available for Windows. You must disable pre-compilation of these features with set DS_BUILD_AIO=0.

Correct. In my case I am installing in Windows. Where does the "set DS_BUILD_AIO=0" option go/ Is it an environment variable?

KbKev78 commented 1 year ago

I found a resource elsewhere with this syntax: $env:DS_BUILD_OPS = 0, which appeared to do the trick.

This has got me to the next issue: sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.0

So I'm off to try and arrange that.

AngelTs commented 1 year ago

@AngelTs and @KbKev78 can you please provide some additional information about your environments? Are you also trying to install on Windows? Thanks

Windows 10 Pro [22H2] [19045.2846], GTX 1060 6GB, pip 23.1, python 3.10.150.0

alnrott commented 1 year ago

i have the same problem, this seems to be a bug with the lastest versions of the dependencies ?

mrwyattii commented 1 year ago

I found a resource elsewhere with this syntax: $env:DS_BUILD_OPS = 0, which appeared to do the trick.

This has got me to the next issue: sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.0

So I'm off to try and arrange that.

@KbKev78 If you don't need sparse attention for your install, you can also disable that with DS_BUILD_SPARSE_ATTN=0 (similar to what you did with DS_BUILD_AIO=0). You can find the full list of environment variables that change installation behavior here: https://www.deepspeed.ai/tutorials/advanced-install/

mrwyattii commented 1 year ago

@alnrott and @AngelTs can you please try setting the following environment variables and try installing again?

DS_BUILD_AIO=0
DS_BUILD_SPARSE_ATTN=0
AngelTs commented 1 year ago

@alnrott and @AngelTs can you please try setting the following environment variables and try installing again?

DS_BUILD_AIO=0
DS_BUILD_SPARSE_ATTN=0

After installing CUDA 11.7.0 (May 2022), not the newest CUDA 12.1.1 (April 2023) and executing of "python setup.py bdist_wheel" the errors are:

csrc/transformer/inference/csrc/pt_binding.cpp(536): error C2398: Element '1': conversion from 'size_t' to '_Ty' requires a narrowing conversion with [ _Ty=int64_t ] csrc/transformer/inference/csrc/pt_binding.cpp(1809): note: see reference to function template instantiation 'std::vector<at::Tensor,std::allocator> ds_softmax_context(at::Tensor &,at::Tensor &,int,bool,bool,int,float,bool,bool,int,bool,unsigned int,unsigned int,at::Tensor &)' being compiled csrc/transformer/inference/csrc/pt_binding.cpp(537): error C2398: Element '2': conversion from 'size_t' to '_Ty' requires a narrowing conversion with [ _Ty=int64_t ] csrc/transformer/inference/csrc/pt_binding.cpp(545): error C2398: Element '1': conversion from 'size_t' to '_Ty' requires a narrowing conversion with [ _Ty=int64_t ] csrc/transformer/inference/csrc/pt_binding.cpp(546): error C2398: Element '2': conversion from 'size_t' to '_Ty' requires a narrowing conversion with [ _Ty=int64_t ] error: command 'C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\bin\HostX86\x64\cl.exe' failed with exit code 2

C:\DeepSpeed-master>

AngelTs commented 1 year ago

The not so good but working solution about above errors if use already created deepspeed-0.8.3+6eca037c-cp310-cp310-win_amd64.whl. In this case i succeeded to install DeepSpeed on Windows 10 without WSL or Anaconda, Miniconda, Maxiconda, bonbona and other shits ... pip install deepspeed-0.8.3+6eca037c-cp310-cp310-win_amd64.whl

AngelTs commented 1 year ago

Here is a quick tutorial how to compile on clean Windows 10 without any shits like WSL, XXXconda, dockers, mokers, fuckers, etc.: 1.Copy the DeepSpeed-master.zip in the root of C dirve 2.In file pt_binding.cpp must replace four lines: present {hidden_dim InferenceContext::Instance().GetMaxTokenLenght(), k InferenceContext::Instance().GetMaxTokenLenght(), {hidden_dim InferenceContext::Instance().GetMaxTokenLenght(), k InferenceContext::Instance().GetMaxTokenLenght(), with {static_cast(hidden_dim InferenceContext::Instance().GetMaxTokenLenght()), static_cast(k InferenceContext::Instance().GetMaxTokenLenght()), {static_cast(hidden_dim InferenceContext::Instance().GetMaxTokenLenght()), static_cast(k InferenceContext::Instance().GetMaxTokenLenght()), 3.Then start build_win.bat 4.Go in dist directory and install just created whl file. In my case the synaxis is: pip install deepspeed-0.9.2+unknown-cp310-cp310-win_amd64.whl

GalaxyHe2023 commented 1 year ago

Deepspeed do not support windows,please use wsl.I got the same error ,and very easy to fix it by use wsl. https://docs.microsoft.com/en-us/windows/wsl/install-win10

CCodeInspect commented 9 months ago

Deepspeed do not support windows,please use wsl.I got the same error ,and very easy to fix it by use wsl. https://docs.microsoft.com/en-us/windows/wsl/install-win10

Thanks,I have tried to download wsl on windows and install.I hope wsl can work.

CCodeInspect commented 9 months ago

@JerryAllison It looks like you are trying to install on Windows? It can be a little tricky to get DeepSpeed installed on Windows (but it is possible). We highly recommend using WSL and installing DeepSpeed in that environment.

However, if you don't want to use WSL: The error you are seeing now is related to libaio not being available for Windows. You must disable pre-compilation of these features with set DS_BUILD_AIO=0.

i have already install wsl and how can i use wsl to install DeepSpeed ?thank you~

CCodeInspect commented 9 months ago

@alnrott and @AngelTs can you please try setting the following environment variables and try installing again?

DS_BUILD_AIO=0
DS_BUILD_SPARSE_ATTN=0

DS_BUILD_AIO=0 DS_BUILD_SPARSE_ATTN=0

where should i set the two params?

CCodeInspect commented 9 months ago

Issue with installing DeepSpeed, "pip install deepspeed" resulted in the following error:

image

System info (please complete the following information):

  • OS: [Windows 11 22H2]
  • GPU count and types [GTX 1060 6GB]
  • Python 3.10.5 torch 2.0.0+cu118 torchaudio 2.0.1+cu118 torchvision 0.15.1+cu118

pls help me.

i can use this command to solve : pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

maxbrunet commented 5 months ago

I think this might be fixable by adding torch as a build requirement to DeepSpeed by following PEP-518. Concretely, adding a pyproject.toml to this repo with:

[build-system]
requires = [
    "setuptools",
    "torch",
]
build-backend = "setuptools.build_meta"

See also https://setuptools.pypa.io/en/latest/userguide/dependency_management.html#build-system-requirement

I have not tested that yet

oldgithubman commented 3 months ago

pip install . --no-build-isolation works for me