Closed ccozad closed 4 months ago
Maybe supporting configurable back ends for torch.distributed is an option? https://pytorch.org/docs/stable/distributed.html
Hi! The example scripts in this repo are for running inference on single (for 8B) and multi (for 70B) GPU setups using CUDA, but Windows is not currently supported.
You might want to check out these examples for running Llama locally / without distributed via hugging face or ollama https://github.com/meta-llama/llama-recipes/tree/main/recipes/quickstart/Running_Llama2_Anywhere
@subramen Thank you for the confirmation.
I setup a linux machine on AWS and got things to run. I put together a guide here: https://github.com/ccozad/ml-reference-designs/blob/master/llm/llama-3/hello-world/README.md
Perhaps in the future Microsoft, Nvidia and other vendors will open more options to put gaming computers to good use.
@subramen See my comment on #127 , I was able to get the model to build on Windows by initializing the gloo
backend before calling Llama.build()
I'm running from a local Jupyter Notebook on Windows. I'm attempting to port the chat example and I get errors about initializing torch.distributed. (RANK not defined, MASTER_ADDR not defined, etc.) I tried following the manual
nccl
steps outlined here: https://stackoverflow.com/questions/56805951/valueerror-error-initializing-torch-distributed-using-env-rendezvous-enviro but I just get a loop with failing to connect.It looks like the start of Llama.build() is where things are erroring out.
I'll be going through the nccl debug process and will eventually switch to Linux if needed but first my questions.