Closed josephrocca closed 1 year ago
Hi @josephrocca, thanks for using DeepSpeed. Could you try pre-compiling and let me know the outcome? To do so:
pip uninstall -y deepspeed
git clone https://github.com/microsoft/DeepSpeed.git && cd DeepSpeed
DS_BUILD_OPS=1 pip install .
or DS_BUILD_UTILS=1 pip install .
(read more here: https://www.deepspeed.ai/tutorials/advanced-install/#pre-install-deepspeed-ops)Hi @mrwyattii, I tried both the DS_BUILD_OPS
option and the DS_BUILD_UTILS
option on a fresh Lambda Cloud machine, and both gave errors. Please see here for the full error logs of both attempts: https://gist.github.com/josephrocca/8417c4665cbfef89ba85e439c17500da
Solution?
I see this error message in the gist log. Can you confirm that pybind11 is installed?
This looks to have been pybind11 related, if you are still having issues with this, please re-open.
sudo apt install python3-pybind11
windows上,我在这里https://pypi.org/project/deepspeed/#files 下载了对应的包,解压之后直接放在虚拟环境里可以成功
Describe the bug As shown in this notebook, I run these commands:
This is exactly following the instructions in the readme of DeepSpeedExamples/tree/master/model_compression/gpt2 except that I had to install ninja because the machine didn't have it yet.
And after some progress, the
run_zero_quant.sh
script throwsRuntimeError: Error building extension 'utils'
(please see the notebook for full logs).To Reproduce Steps to reproduce the behavior:
Expected behavior There is a related issue here:
The apparent solution there was to ensure that the deepspeed wheel was built with the same cuda version as the machine has installed. But the
ds_report
shows that the versions match. So I guess the "expected behavior" here is that it shouldn't throw the error that I'm seeing.ds_report output As seen in the above-linked notebook:
System info (please complete the following information):