lucidrains / DALLE-pytorch

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch
MIT License
5.55k stars 643 forks source link

Share my installation of DeepSpeed #252

Open PKULiuHui opened 3 years ago

PKULiuHui commented 3 years ago

I can't install deepspeed using the instructions provided by the author. So I spent a lot of time before successfully installing deepspeed and triton, and train dalle with "--attn_type full,sparse". I share my experience below so that someone facing the same problem may save time.

  1. install triton-0.4.0 pip install triton==0.4.0 By default, deepspeed require triton-0.2.3, which I can't successfully install on my server. So I install triton-0.4.0.

  2. download the deepspeed that support latest triton source code and change into this directory.

  3. edit the requirements file vi requirements/requirements-sparse_attn.txt change the content to 'triton==0.4.0'

  4. install deepspeed with sparse attention DS_BUILD_SPARSE_ATTN=1 pip install .

  5. check deepspeed with ds_report, it will show screen_cut

Note: you may need to install llvm-9 using sudo apt-get -y install llvm-9-dev cmake. My server OS is Ubuntu LTS 16.04 and install llvm-9 is troublesome. I just use llvm and update my gcc version to 6.5.0. It worked as well.

Finally, I can train dalle with sparse attention. Hope it can help you. 捕获

afiaka87 commented 3 years ago

I can't install deepspeed using the instructions provided by the author. So I spent a lot of time before successfully installing deepspeed and triton, and train dalle with "--attn_type full,sparse". I share my experience below so that someone facing the same problem may save time.

1. install triton-0.4.0
   `pip install triton==0.4.0`
   By default, deepspeed require triton-0.2.3, which I can't successfully install on my server. So I install triton-0.4.0.

2. download the [deepspeed that support latest triton](https://github.com/microsoft/DeepSpeed/tree/sparse-attn/support-latest-triton) source code and change into this directory.

3. edit the requirements file
   `vi requirements/requirements-sparse_attn.txt`
   change the content to 'triton==0.4.0'

4. install deepspeed with sparse attention
   `DS_BUILD_SPARSE_ATTN=1 pip install .`

5. check deepspeed with `ds_report`, it will show
   ![screen_cut](https://user-images.githubusercontent.com/32560313/118494657-fcb83a80-b754-11eb-9f45-fdaa8ee1851c.PNG)

Note: you may need to install llvm-9 using sudo apt-get -y install llvm-9-dev cmake. My server OS is Ubuntu LTS 16.04 and install llvm-9 is troublesome. I just use llvm and update my gcc version to 6.5.0. It worked as well.

Finally, I can train dalle with sparse attention. Hope it can help you. 捕获

Thank you so much for figuring this out! We have so many issues with deepspeed. It's worth mentioninng to anyone else who may find these instructions useful - this will (unfortunately) break theDeepSpeed ZeRO configuration for using cpu-based Adam, etc. Shouldn't really be a problem on single GPU setups though.

robvanvolt commented 3 years ago

Unfortunately, the installation does not work with the latest Nvidia GPUS (30XX), and triton==1.0.0.dev20210329 got permanently deleted (https://github.com/ptillet/triton/issues/99)...

edit.: a598fba0 (HEAD) [DOCS] Various improvements and typo fixes seems to work (triton branch 1.0.0)

https://github.com/lucidrains/DALLE-pytorch/wiki/Deepspeed---Installation#for-the-latest-nvidia-gpus-3090-3080-3070-3060-rtx-try-the-following

ptillet commented 3 years ago

The Triton wheel has been updated. I think pip install triton==0.4.1 should work now.