microsoft / DeepSpeed-MII

MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
Apache License 2.0
1.88k stars 176 forks source link

RuntimeError: This event loop is already running when running with fastapi #236

Open tulika612 opened 1 year ago

tulika612 commented 1 year ago

Hi team,

I am trying to integrate deepspeed-mii into fastapi service. I am getting following error:

{"detail":"This event loop is already running"}

Here's my code for reference

from fastapi import FastAPI, HTTPException, Body
from pydantic import BaseModel
import mii
import logging
from typing import Union, List

logging.basicConfig(
    level=logging.INFO, format="%(asctime)s - %(name)s - %(levelname)s - %(message)s"
)

app = FastAPI()

class UserParams(BaseModel):
    prompt: str
    model: str
    tensor: int

@app.post("/deploy/")
async def deploy_model(input_data: UserParams = Body(...)):
    deployment_name = input_data.model + "_deployment"
    logging.info(f"Received request to deploy '{deployment_name}'")
    mii_configs = {
        "tensor_parallel": input_data.tensor,
        "enable_restful_api": True,
    }
    logging.info(f"Deploying '{deployment_name}'")
    mii.deploy(task="text-generation",
               model=input_data.model,
               model_path = "/app/multi-gpu/cache",
               deployment_name=deployment_name,
               mii_config=mii_configs
               )
    try:
        logging.info(f"Creating generator for deployment '{deployment_name}'")
        generator = mii.mii_query_handle(deployment_name)
        logging.info(f"Deployment '{deployment_name}' successful")
        response = {"message": f"Deployment '{deployment_name}' successful"}

        logging.info(f"Received request for deployment '{deployment_name}'")
        generator = mii.mii_query_handle(deployment_name)
        logging.info(f"Generating text for deployment '{deployment_name}'")
        logging.info(f"Type of generator: {type(generator)}")
        logging.info(f"Input: {input_data.prompt}")
        response = generator.query({'query': input_data.prompt})
        logging.info(f"Text generated for deployment '{deployment_name}'")
        logging.info(f"Response: {response}")
        return response.response
    except Exception as e:
        logging.error(f"Deployment '{deployment_name}' failed")
        raise HTTPException(status_code=500, detail=str(e))

I would greatly appreciate any assistance or guidance you can provide to resolve this issue.

mrwyattii commented 1 year ago

@tulika612 It looks like there may be some odd interactions between the fastapi server and the grpc server that MII creates. I think you should reconsider how you are using MII. MII itself will create a grpc server that you can send queries to, so nesting the creation of multiple server processes with fastapi + grpc doesn't make a lot of sense. I think there are two potential solutions:

1) Utilize the RESTful API that MII provides outside of fastapi. This will still allow you to send and receive queries via curl and will replace fastapi

2) Utilize MII in non-persistent mode. Because fastapi already provides a persistent server process, you can utilize MII's non-persistent deployment mode to avoid these odd interactions between fastapi and grpc. Here is a working example:

from fastapi import FastAPI, HTTPException, Body
from pydantic import BaseModel
import mii
import logging
from typing import Union, List

app = FastAPI()

class UserParams(BaseModel):
    prompt: str
    model: str
    tensor: int

@app.post("/deploy/")
async def deploy_model(input_data: UserParams = Body(...)):
    deployment_name = input_data.model + "_deployment"
    mii_configs = {
        "tensor_parallel": input_data.tensor,
    }
    mii.deploy(
        task="text-generation",
        model=input_data.model,
        deployment_name=deployment_name,
        mii_config=mii_configs,
        deployment_type=mii.DeploymentType.NON_PERSISTENT,
    )
    try:
        generator = mii.mii_query_handle(deployment_name)
        response = generator.query({"query": input_data.prompt})
        return response
    except Exception as e:
        logging.error(f"Deployment '{deployment_name}' failed")
        raise HTTPException(status_code=500, detail=str(e))

We start the server with uvicorn main:app and then I can send a request and see the response:

❯ curl -X POST -H "Content-Type: application/json" -d '{"model": "gpt2", "tensor":"1", "prompt": "hello world"}' http://localhost:8000/deploy/
[{"generated_text":"hello world of sports, fitness and entertainment at the 2018 Winter Games at Sochi 2016 – and here's what you need to know.\n\n\"The Winter Olympics in Russia are important events of international history,\" said Kazana Nairna, director of"}]
tulika612 commented 1 year ago

@mrwyattii Thanks for your response