The client socket has failed to connect to [Maxim]:12355 (system error: 10049 - The requested address is not valid in its context.).

I am trying to use the example repo to see an initial output from the 70B-Instructed meta model. however, i am stuck in what seems to be a PyTorch issue. i isolated it down to that piece of code.

if not torch.distributed.is_initialized():
            torch.distributed.init_process_group(backend='gloo', init_method='tcp://localhost:12355', rank = torch.cuda.device_count(), world_size = 8) **<------this line of code hangs and causes the following error.**
        if not model_parallel_is_initialized():
            if model_parallel_size is None:
                model_parallel_size = int(os.environ.get("WORLD_SIZE", 8))
            initialize_model_parallel(model_parallel_size)

err: [W socket.cpp:697] [c10d] The client socket has failed to connect to [kubernetes.docker.internal]:18355 (system error: 10049 - The requested address is not valid in its context.).

I have cuda 12.1 and PyTorch latest installed. I am on windows, so hence the backend change to gloo. I have tried it on my other machines with the same issue. i disconnected the internet, and still pesists. Eventually, i tried it on a friends machine that lives in the nearby and he also faced the same issue.

meta-llama / llama3

The client socket has failed to connect to [Maxim]:12355 (system error: 10049 - The requested address is not valid in its context.). #205