microsoft / DeepSpeed-MII

MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
Apache License 2.0
1.87k stars 175 forks source link

How to deploy a RESTful API deepspeed MII on one node? #164

Open shaoxuefeng opened 1 year ago

shaoxuefeng commented 1 year ago

Follow the README doc, I would like to deploy a RESTful API on one node,
But I got a ValueError: No slot '1' specified on host 'localhost' error: the deploy python code :

import mii
from mii import DeploymentType

if __name__ == "__main__":
    HOST_FILE_PATH = "./hostfile"
    mii_configs = {
        "tensor_parallel": 8,
        "dtype": "fp16",
        "enable_restful_api": True,
        "restful_api_port": 8080,
        "skip_model_check": True,
        "enable_load_balancing": False,
        "replica_num": 1,
        "hostfile": HOST_FILE_PATH,
    }

    mii.deploy(task="text-generation",
               model="/workspace/workfile/Models/gptj-350m",
               deployment_name="codegen-350m",
               mii_config=mii_configs,
               deployment_type=DeploymentType.LOCAL)

And the hosfile:

localhost slots=8

According to the Deepspeed Issue, it seems we can't start with hosftile on only node. I even update deepspeed pkg to lastest master version, but it still not work.

deepspeed          0.8.3+unknown
deepspeed-mii      0.0.5+unknown

So, How can i start a a RESTful API deepspeed MII on one node? Thank you!

Wohoholo commented 1 year ago

i have started with hostfile on only node(my machine has 2 gpu, but i only can deploy on one gpu). configs: tensor_parallel: 1 deploy_rank: 0 other params are the same as yours my hostfile's content: 127.0.0.1 slots=2

and, by the way, u need to set your node passwordless login itself by ssh. i want to know how to deploy on one node with multi gpu?

Wohoholo commented 1 year ago

i find some detail in source script.

  1. config.py:
  2. @root_validator
  3. def auto_enable_load_balancing(cls, values):
  4. if values["enable_restful_api"] and not values["enable_load_balancing"]:
  5. logger.warn("Restful API is enabled, enabling Load Balancing")
  6. values["enable_load_balancing"] = True
  7. return values

it will make your "enable_load_balancing" become True.Then server.py:

  1. if mii_configs.enable_load_balancing:
  2. Start replica instances

  3. for i, repl_config in enumerate(lb_config.replica_configs):
  4. hostfile = tempfile.NamedTemporaryFile(delete=False)
  5. hostfile.write(
  6. f'{repl_config.hostname} slots={mii_configs.replica_num}\n'.encode())
  7. processes.append(
  8. self._launch_deepspeed(
  9. deployment_name,
  10. model_name,
  11. model_path,
  12. ds_optimize,
  13. ds_zero,
  14. ds_config,
  15. mii_configs,
  16. hostfile.name,
  17. repl_config.hostname,
  18. repl_config.tensor_parallel_ports[0],
  19. mii_configs.torch_dist_port + (100 * i) +
  20. repl_config.gpu_indices[0],
  21. repl_config.gpu_indices))

it will write a temp file with your "replica_num" but not your hostfile. you can comment line 5, 6 and rewite line 17 to mii_configs.hostfile. And "tensor_parallel" must be equal to length of parameter "deploy rank" in mii_configs. Hope that will be helpful