pinokiofactory / cogstudio

254 stars 15 forks source link

Manual install: Can't install deepspeed without torch installed first (Windows) #3

Open LuckyNES opened 1 month ago

LuckyNES commented 1 month ago

I'm using Windows 11. If I try to install requirements.txt, deepspeed will not install because it says torch needs to be installed, so maybe the instructions are out of order. Here is what happens if I try to install deepspeed.

error: (.venv) PS D:\GitRepos\CogVideo> pip install deepspeed Collecting deepspeed Using cached deepspeed-0.15.1.tar.gz (1.4 MB) Preparing metadata (setup.py) ... error error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully. │ exit code: 1 ╰─> [9 lines of output] Traceback (most recent call last): File "", line 2, in File "", line 34, in File "C:\Users\myusername\AppData\Local\Temp\pip-install-ryve96yo\deepspeed_08f2b14bcce04e279c8dd9b5572ff4a1\setup.py", line 155, in assert torch_available, "Unable to pre-compile ops without torch installed. Please install torch before attempting to pre-compile ops." AssertionError: Unable to pre-compile ops without torch installed. Please install torch before attempting to pre-compile ops. [WARNING] Unable to import torch, pre-compiling ops will be disabled. Please visit https://pytorch.org/ to see how to properly install torch on your system. [WARNING] unable to import torch, please install it if you want to pre-compile any deepspeed ops. DS_BUILD_OPS=1 [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip. error: metadata-generation-failed

× Encountered error while generating package metadata. ╰─> See above for output.

note: This is an issue with the package mentioned above, not pip. hint: See above for details.

LuckyNES commented 1 month ago

If I try to install deepspeed after installing torch, I get new errors:

`(.venv) PS D:\GitRepos\CogVideo> pip install deepspeed Collecting deepspeed Using cached deepspeed-0.15.1.tar.gz (1.4 MB) Preparing metadata (setup.py) ... error error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully. │ exit code: 1 ╰─> [44 lines of output] test.c LINK : fatal error LNK1181: cannot open input file 'aio.lib' test.c LINK : fatal error LNK1181: cannot open input file 'cufile.lib' Traceback (most recent call last): File "", line 2, in File "", line 34, in File "C:\Users\myusername\AppData\Local\Temp\pip-install-u77g4rw2\deepspeed_25b0c4669b0f474291d506154c28470a\setup.py", line 198, in ext_modules.append(builder.builder()) ^^^^^^^^^^^^^^^^^ File "C:\Users\myusername\AppData\Local\Temp\pip-install-u77g4rw2\deepspeed_25b0c4669b0f474291d506154c28470a\op_builder\builder.py", line 720, in builder extra_link_args=self.strip_empty_entries(self.extra_ldflags())) ^^^^^^^^^^^^^^^^^^^^ File "C:\Users\myusername\AppData\Local\Temp\pip-install-u77g4rw2\deepspeed_25b0c4669b0f474291d506154c28470a\op_builder\inference_cutlass_builder.py", line 74, in extra_ldflags import dskernels ModuleNotFoundError: No module named 'dskernels' DS_BUILD_OPS=1 [WARNING] Skip pre-compile of incompatible async_io; One can disable async_io with DS_BUILD_AIO=0 Installed CUDA version 12.2 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination Installed CUDA version 12.2 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination [WARNING] cpu_adam requires the 'lscpu' command, but it does not exist! [WARNING] cpu_adam attempted to query 'lscpu' after failing to use py-cpuinfo to detect the CPU architecture. 'lscpu' does not appear to exist on your system, will fall back to use -march=native and non-vectorized execution. [WARNING] cpu_adam requires the 'lscpu' command, but it does not exist! [WARNING] cpu_adam attempted to query 'lscpu' after failing to use py-cpuinfo to detect the CPU architecture. 'lscpu' does not appear to exist on your system, will fall back to use -march=native and non-vectorized execution. Installed CUDA version 12.2 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination Installed CUDA version 12.2 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination [WARNING] cpu_adagrad requires the 'lscpu' command, but it does not exist! [WARNING] cpu_adagrad attempted to query 'lscpu' after failing to use py-cpuinfo to detect the CPU architecture. 'lscpu' does not appear to exist on your system, will fall back to use -march=native and non-vectorized execution. [WARNING] cpu_adagrad requires the 'lscpu' command, but it does not exist! [WARNING] cpu_adagrad attempted to query 'lscpu' after failing to use py-cpuinfo to detect the CPU architecture. 'lscpu' does not appear to exist on your system, will fall back to use -march=native and non-vectorized execution. Installed CUDA version 12.2 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination Installed CUDA version 12.2 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination [WARNING] cpu_lion requires the 'lscpu' command, but it does not exist! [WARNING] cpu_lion attempted to query 'lscpu' after failing to use py-cpuinfo to detect the CPU architecture. 'lscpu' does not appear to exist on your system, will fall back to use -march=native and non-vectorized execution. [WARNING] cpu_lion requires the 'lscpu' command, but it does not exist! [WARNING] cpu_lion attempted to query 'lscpu' after failing to use py-cpuinfo to detect the CPU architecture. 'lscpu' does not appear to exist on your system, will fall back to use -march=native and non-vectorized execution. Installed CUDA version 12.2 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination [WARNING] Skip pre-compile of incompatible evoformer_attn; One can disable evoformer_attn with DS_BUILD_EVOFORMER_ATTN=0 [WARNING] Skip pre-compile of incompatible fp_quantizer; One can disable fp_quantizer with DS_BUILD_FP_QUANTIZER=0 Installed CUDA version 12.2 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination Installed CUDA version 12.2 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination [WARNING] Skip pre-compile of incompatible gds; One can disable gds with DS_BUILD_GDS=0 Installed CUDA version 12.2 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination Installed CUDA version 12.2 does not match the version torch was compiled with 12.1 but since the APIs are compatible, accepting this combination [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip. error: metadata-generation-failed

× Encountered error while generating package metadata. ╰─> See above for output.

note: This is an issue with the package mentioned above, not pip. hint: See above for details.`

LuckyNES commented 1 month ago

It seems deepspeed has no WIndows support?

https://github.com/microsoft/DeepSpeed/issues/1769

LuckyNES commented 1 month ago

It seems this deepseed comes from SwissArmyTransformer

I see this in the output when installing requirements.txt Collecting deepspeed (from SwissArmyTransformer>=0.4.12->-r requirements.txt (line 8)) Downloading deepspeed-0.15.1.tar.gz (1.4 MB)

LuckyNES commented 1 month ago

I just found out the order of requirements.txt does not guarantee the install order. Might be a clue might not.

LuckyNES commented 1 month ago

SwissArmyTransformer uses deepspeed since the initial commit. Do we need SwissArmyTransformer? If deepspeed is not Windows compatible it's like a dead end.

LuckyNES commented 1 month ago

In the end, this program will work without SwissArmyTransformer on Windows. Maybe we just need a special requirements_windows.txt

GarbageHaus commented 2 weeks ago

I've been running into this issue too. Mostly from what you've described. However, even on the main repo the requirements.txt file doesn't work and has the same problem. For this to work, we'd need to completely cut out SwissArmyTransformer somehow.