How to deploy a RESTful API deepspeed MII on one node?

shaoxuefeng commented 1 year ago

Follow the README doc, I would like to deploy a RESTful API on one node,
But I got a ValueError: No slot '1' specified on host 'localhost' error: the deploy python code :

import mii
from mii import DeploymentType

if __name__ == "__main__":
    HOST_FILE_PATH = "./hostfile"
    mii_configs = {
        "tensor_parallel": 8,
        "dtype": "fp16",
        "enable_restful_api": True,
        "restful_api_port": 8080,
        "skip_model_check": True,
        "enable_load_balancing": False,
        "replica_num": 1,
        "hostfile": HOST_FILE_PATH,
    }

    mii.deploy(task="text-generation",
               model="/workspace/workfile/Models/gptj-350m",
               deployment_name="codegen-350m",
               mii_config=mii_configs,
               deployment_type=DeploymentType.LOCAL)

And the hosfile:

localhost slots=8

According to the Deepspeed Issue, it seems we can't start with hosftile on only node. I even update deepspeed pkg to lastest master version, but it still not work.

deepspeed          0.8.3+unknown
deepspeed-mii      0.0.5+unknown

So, How can i start a a RESTful API deepspeed MII on one node? Thank you!

Wohoholo commented 1 year ago

i have started with hostfile on only node(my machine has 2 gpu, but i only can deploy on one gpu). configs: tensor_parallel: 1 deploy_rank: 0 other params are the same as yours my hostfile's content: 127.0.0.1 slots=2

and, by the way, u need to set your node passwordless login itself by ssh. i want to know how to deploy on one node with multi gpu?

Wohoholo commented 1 year ago

i find some detail in source script.

config.py:
@root_validator
def auto_enable_load_balancing(cls, values):
if values["enable_restful_api"] and not values["enable_load_balancing"]:
logger.warn("Restful API is enabled, enabling Load Balancing")
values["enable_load_balancing"] = True
return values

it will make your "enable_load_balancing" become True.Then server.py:

if mii_configs.enable_load_balancing:
Start replica instances
for i, repl_config in enumerate(lb_config.replica_configs):
hostfile = tempfile.NamedTemporaryFile(delete=False)
hostfile.write(
f'{repl_config.hostname} slots={mii_configs.replica_num}\n'.encode())
processes.append(
self._launch_deepspeed(
deployment_name,
model_name,
model_path,
ds_optimize,
ds_zero,
ds_config,
mii_configs,
hostfile.name,
repl_config.hostname,
repl_config.tensor_parallel_ports[0],
mii_configs.torch_dist_port + (100 * i) +
repl_config.gpu_indices[0],
repl_config.gpu_indices))

it will write a temp file with your "replica_num" but not your hostfile. you can comment line 5, 6 and rewite line 17 to mii_configs.hostfile. And "tensor_parallel" must be equal to length of parameter "deploy rank" in mii_configs. Hope that will be helpful

microsoft / DeepSpeed-MII

How to deploy a RESTful API deepspeed MII on one node? #164

Start replica instances