Open Chenhzjs opened 9 months ago
Hi @Chenhzjs if you use mii.serve
to start your server, you do not need to use the deepspeed
launcher to take advantage of tensor parallelism. mii.serve
will call the DeepSpeed launcher, so when you run your script with deepspeed --num_gpus 2
you are attempting to launch 2 inference servers (and thus you see the address already in use error).
this section of code has the same issue:
from mii import pipeline pipe = pipeline("mistralai/Mistral-7B-Instruct-v0.1") output = pipe(["Hello, my name is", "DeepSpeed is"], max_new_tokens=128) print(output)
error info:
RuntimeError: The server socket has failed to listen on any local network address. The server socket has failed to bind to [::]:29500 (errno: 98 - Address already in use). The server socket has failed to bind to 0.0.0.0:29500 (errno: 98 - Address already in use)
it use pipeline only and there's no additional calling of mii.serve
When I try to use
deepspeed --num_gpus 2 xxx.py
to start the server, the error will occur. But if I usepython3 xxx.py
to start the server, it works well. I want to deployllama-70b
(maybe 140G) on 2 A100(80G per A100) so I have to usedeepspeed
to start the server. Here is the INFO:At first, I thought it was just a process occupying this port, so I change it to 29700. But as you see, the problem has not been solved. How can I do that? The code is just like the example(but use llama-7b):