Closed sashasubbbb closed 1 year ago
From my experience deepspeed wont run on wsl, or even over docker Maybe someone can prove me wrong
I got it working!; it needs a ton of memory though. More than enough RAM assigned to the VM to load the whole thing in RAM before getting dumped to VRAM.
Also, you need to jump through a ton of hoops to get the cuda toolkit working with conda-forge on WSL (as well as of course fixing WSL's issues with DNS passthrough and GPU passthrough...). But... it works.
I've written a guide on how to do this for total noobs in the context of pygmalion over on the pygmalion subreddit HERE.
TL;DR: WSL2 has a completely broken implementation of DNS and CUDA, and that is the issue. Oh, and the Error -9 that pops up on deepspeed a lot? That's the VM running out of RAM. So you have to reconfigure the amount of RAM the VM gets, deepspeed loads the entire model into sysRAM before offloading to VRAM - and the default 8-12GB allocation is too small.
I'm using docker over wsl2 and I was able to run GPT-NeoXT-Chat-Base-20B
using: python server.py --auto-devices --gpu-memory 8 --cai-chat --load-in-8bit --listen --listen-port 8888 --model=GPT-NeoXT-Chat-Base-20B
which is 38.4GB in size and I didn't need to update the .wslconfig
file
Maybe I'm missing something
Also I was getting error code -11
not -9
This issue has been closed due to inactivity for 30 days. If you believe it is still relevant, please leave a comment below.
I've got WSL2 Ubuntu running on Windows 11 configured to use 28 GB of RAM: Tried both unsharded and sharded to 1GB chuncks model.
free -h --giga
When i try to load pyg-6b model with:
deepspeed --num_gpus=1 server.py --deepspeed --cai-chat
I get:
I've managed to load pyg-350m model just fine with deepspeed. Is deepspeed working incorrectly on WSL? Do you have any clue?