Closed nightsSeeker closed 3 days ago
The error is probably related to the init_method arg you have passed... why are you passing that in?
Ensure your machine has 8 GPUs as that is a requirement for 70B. If not, then you can use HF to load the 70B model.
Running on windows is possible with gloo, pls take a look at https://github.com/meta-llama/llama3/issues/127#issuecomment-2075800144 for how they did it.
I am trying to use the example repo to see an initial output from the 70B-Instructed meta model. however, i am stuck in what seems to be a PyTorch issue. i isolated it down to that piece of code.
err: [W socket.cpp:697] [c10d] The client socket has failed to connect to [kubernetes.docker.internal]:18355 (system error: 10049 - The requested address is not valid in its context.).
I have cuda 12.1 and PyTorch latest installed. I am on windows, so hence the backend change to gloo. I have tried it on my other machines with the same issue. i disconnected the internet, and still pesists. Eventually, i tried it on a friends machine that lives in the nearby and he also faced the same issue.