microsoft / DeepSpeed-MII

MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
Apache License 2.0
1.87k stars 175 forks source link

RuntimeError: The server socket has failed to listen on any local network address. The server socket has failed to bind to [::]:29700 (errno: 98 - Address already in use). #383

Open Chenhzjs opened 9 months ago

Chenhzjs commented 9 months ago

When I try to use deepspeed --num_gpus 2 xxx.py to start the server, the error will occur. But if I use python3 xxx.py to start the server, it works well. I want to deploy llama-70b(maybe 140G) on 2 A100(80G per A100) so I have to use deepspeed to start the server. Here is the INFO:

[2024-01-20 10:15:26,416] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-01-20 10:15:26,676] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-01-20 10:15:26,846] [INFO] [launch.py:145:main] WORLD INFO DICT: {'localhost': [0, 1]}
[2024-01-20 10:15:26,846] [INFO] [launch.py:151:main] nnodes=1, num_local_procs=2, node_rank=0
[2024-01-20 10:15:26,846] [INFO] [launch.py:162:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1]})
[2024-01-20 10:15:26,846] [INFO] [launch.py:163:main] dist_world_size=2
[2024-01-20 10:15:26,846] [INFO] [launch.py:165:main] Setting CUDA_VISIBLE_DEVICES=0,1
[2024-01-20 10:15:26,967] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2024-01-20 10:15:26,967] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2024-01-20 10:15:27,150] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2024-01-20 10:15:27,150] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2024-01-20 10:15:27,259] [INFO] [launch.py:145:main] WORLD INFO DICT: {'localhost': [0, 1]}
[2024-01-20 10:15:27,260] [INFO] [launch.py:151:main] nnodes=1, num_local_procs=2, node_rank=0
[2024-01-20 10:15:27,260] [INFO] [launch.py:162:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1]})
[2024-01-20 10:15:27,260] [INFO] [launch.py:163:main] dist_world_size=2
[2024-01-20 10:15:27,260] [INFO] [launch.py:165:main] Setting CUDA_VISIBLE_DEVICES=0,1
[2024-01-20 10:15:28,970] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-01-20 10:15:29,041] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-01-20 10:15:29,083] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-01-20 10:15:29,117] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-01-20 10:15:29,509] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-01-20 10:15:29,576] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-01-20 10:15:29,576] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
[2024-01-20 10:15:29,804] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-01-20 10:15:29,805] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
[W socket.cpp:436] [c10d] The server socket has failed to bind to [::]:29700 (errno: 98 - Address already in use).
[W socket.cpp:436] [c10d] The server socket has failed to bind to 0.0.0.0:29700 (errno: 98 - Address already in use).
[E socket.cpp:472] [c10d] The server socket has failed to listen on any local network address.
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/home/infer/miniconda3/lib/python3.11/site-packages/mii/launch/multi_gpu_server.py", line 105, in <module>
    main()
  File "/home/infer/miniconda3/lib/python3.11/site-packages/mii/launch/multi_gpu_server.py", line 98, in main
    inference_pipeline = async_pipeline(args.model_config)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/infer/miniconda3/lib/python3.11/site-packages/mii/api.py", line 167, in async_pipeline
    inference_engine = load_model(model_config)
                       ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/infer/miniconda3/lib/python3.11/site-packages/mii/modeling/models.py", line 14, in load_model
    init_distributed(model_config)
  File "/home/infer/miniconda3/lib/python3.11/site-packages/mii/utils.py", line 187, in init_distributed
    deepspeed.init_distributed(dist_backend="nccl", timeout=timedelta(seconds=1e9))
  File "/home/infer/miniconda3/lib/python3.11/site-packages/deepspeed/comm/comm.py", line 670, in init_distributed
    cdb = TorchBackend(dist_backend, timeout, init_method, rank, world_size)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/infer/miniconda3/lib/python3.11/site-packages/deepspeed/comm/torch.py", line 120, in __init__
    self.init_process_group(backend, timeout, init_method, rank, world_size)
  File "/home/infer/miniconda3/lib/python3.11/site-packages/deepspeed/comm/torch.py", line 146, in init_process_group
    torch.distributed.init_process_group(backend,
  File "/home/infer/miniconda3/lib/python3.11/site-packages/torch/distributed/c10d_logger.py", line 74, in wrapper
    func_return = func(*args, **kwargs)
                  ^^^^^^^^^^^^^^^^^^^^^
  File "/home/infer/miniconda3/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py", line 1141, in init_process_group
    store, rank, world_size = next(rendezvous_iterator)
                              ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/infer/miniconda3/lib/python3.11/site-packages/torch/distributed/rendezvous.py", line 241, in _env_rendezvous_handler
    store = _create_c10d_store(master_addr, master_port, rank, world_size, timeout)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/infer/miniconda3/lib/python3.11/site-packages/torch/distributed/rendezvous.py", line 172, in _create_c10d_store
    return TCPStore(
           ^^^^^^^^^
RuntimeError: The server socket has failed to listen on any local network address. The server socket has failed to bind to [::]:29700 (errno: 98 - Address already in use). The server socket has failed to bind to 0.0.0.0:29700 (errno: 98 - Address already in use).
[2024-01-20 10:15:29,822] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-01-20 10:15:29,878] [INFO] [engine_v2.py:82:__init__] Building model...
[2024-01-20 10:15:29,944] [INFO] [engine_v2.py:82:__init__] Building model...
Using /home/infer/.cache/torch_extensions/py311_cu121 as PyTorch extensions root...
Using /home/infer/.cache/torch_extensions/py311_cu121 as PyTorch extensions root...
[2024-01-20 10:15:30,593] [INFO] [engine_v2.py:82:__init__] Building model...
Using /home/infer/.cache/torch_extensions/py311_cu121 as PyTorch extensions root...
[2024-01-20 10:15:30,848] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 1648350
[2024-01-20 10:15:30,848] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 1648351
[2024-01-20 10:15:31,004] [ERROR] [launch.py:321:sigkill_handler] ['/home/infer/miniconda3/bin/python', '-m', 'mii.launch.multi_gpu_server', '--deployment-name', 'llama-deployment', '--load-balancer-port', '50050', '--restful-gateway-port', '28080', '--restful-gateway-host', 'localhost', '--restful-gateway-procs', '32', '--server-port', '50051', '--zmq-port', '25555', '--model-config', 'eyJtb2RlbF9uYW1lX29yX3BhdGgiOiAiL21udC9MbGFtYS0yLTdiLWNoYXQtaGYiLCAidG9rZW5pemVyIjogIi9tbnQvTGxhbWEtMi03Yi1jaGF0LWhmIiwgInRhc2siOiAidGV4dC1nZW5lcmF0aW9uIiwgInRlbnNvcl9wYXJhbGxlbCI6IDIsICJpbmZlcmVuY2VfZW5naW5lX2NvbmZpZyI6IHsidGVuc29yX3BhcmFsbGVsIjogeyJ0cF9zaXplIjogMn0sICJzdGF0ZV9tYW5hZ2VyIjogeyJtYXhfdHJhY2tlZF9zZXF1ZW5jZXMiOiAyMDQ4LCAibWF4X3JhZ2dlZF9iYXRjaF9zaXplIjogNzY4LCAibWF4X3JhZ2dlZF9zZXF1ZW5jZV9jb3VudCI6IDUxMiwgIm1heF9jb250ZXh0IjogODE5MiwgIm1lbW9yeV9jb25maWciOiB7Im1vZGUiOiAicmVzZXJ2ZSIsICJzaXplIjogMTAwMDAwMDAwMH0sICJvZmZsb2FkIjogZmFsc2V9fSwgInRvcmNoX2Rpc3RfcG9ydCI6IDI5NzAwLCAiem1xX3BvcnRfbnVtYmVyIjogMjU1NTUsICJyZXBsaWNhX251bSI6IDEsICJyZXBsaWNhX2NvbmZpZ3MiOiBbeyJob3N0bmFtZSI6ICJsb2NhbGhvc3QiLCAidGVuc29yX3BhcmFsbGVsX3BvcnRzIjogWzUwMDUxLCA1MDA1Ml0sICJ0b3JjaF9kaXN0X3BvcnQiOiAyOTcwMCwgImdwdV9pbmRpY2VzIjogWzAsIDFdLCAiem1xX3BvcnQiOiAyNTU1NX1dLCAiZGV2aWNlX21hcCI6ICJhdXRvIiwgIm1heF9sZW5ndGgiOiBudWxsLCAiYWxsX3Jhbmtfb3V0cHV0IjogZmFsc2UsICJzeW5jX2RlYnVnIjogZmFsc2UsICJwcm9maWxlX21vZGVsX3RpbWUiOiBmYWxzZX0='] exits with return code = 1
[2024-01-20 10:15:31,968] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2024-01-20 10:15:31,968] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2024-01-20 10:15:32,151] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
[2024-01-20 10:15:32,151] [INFO] [server.py:65:_wait_until_server_is_live] waiting for server to start...
Traceback (most recent call last):
  File "/home/infer/deepspeed-fastgen/quest.py", line 26, in <module>
    client = mii.serve("/mnt/Llama-2-7b-chat-hf", deployment_name="llama-deployment", replica_num=1,   #replica_num=2 tensor_parallel=2
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/infer/miniconda3/lib/python3.11/site-packages/mii/api.py", line 124, in serve
    import_score_file(mii_config.deployment_name, DeploymentType.LOCAL).init()
  File "/tmp/mii_cache/llama-deployment/score.py", line 33, in init
    mii.backend.MIIServer(mii_config)
  File "/home/infer/miniconda3/lib/python3.11/site-packages/mii/backend/server.py", line 47, in __init__
    self._wait_until_server_is_live(processes,
  File "/home/infer/miniconda3/lib/python3.11/site-packages/mii/backend/server.py", line 62, in _wait_until_server_is_live
    raise RuntimeError(
RuntimeError: server crashed for some reason, unable to proceed
[2024-01-20 10:15:33,306] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 1647573
[2024-01-20 10:15:33,306] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 1647574
[2024-01-20 10:15:33,342] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 1648352
[2024-01-20 10:15:33,404] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 1648353
[2024-01-20 10:15:33,463] [INFO] [launch.py:324:sigkill_handler] Main process received SIGTERM, exiting
[2024-01-20 10:15:33,917] [ERROR] [launch.py:321:sigkill_handler] ['/home/infer/miniconda3/bin/python', '-u', 'quest.py', '--local_rank=1'] exits with return code = 1

At first, I thought it was just a process occupying this port, so I change it to 29700. But as you see, the problem has not been solved. How can I do that? The code is just like the example(but use llama-7b):

import mii
client = mii.serve("/mnt/Llama-2-7b-chat-hf", deployment_name="llama-deployment", tensor_parallel=2)
mrwyattii commented 8 months ago

Hi @Chenhzjs if you use mii.serve to start your server, you do not need to use the deepspeed launcher to take advantage of tensor parallelism. mii.serve will call the DeepSpeed launcher, so when you run your script with deepspeed --num_gpus 2 you are attempting to launch 2 inference servers (and thus you see the address already in use error).

maxy218 commented 7 months ago

this section of code has the same issue: from mii import pipeline pipe = pipeline("mistralai/Mistral-7B-Instruct-v0.1") output = pipe(["Hello, my name is", "DeepSpeed is"], max_new_tokens=128) print(output)

error info: RuntimeError: The server socket has failed to listen on any local network address. The server socket has failed to bind to [::]:29500 (errno: 98 - Address already in use). The server socket has failed to bind to 0.0.0.0:29500 (errno: 98 - Address already in use)

it use pipeline only and there's no additional calling of mii.serve