microsoft / DeepSpeed-MII

MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
Apache License 2.0
1.85k stars 174 forks source link

MII Text to Image Task Issue #132

Open gvijqb opened 1 year ago

gvijqb commented 1 year ago

Hi team,

I am facing difficulty in changing batch size in my query on a mii deployment.

For example, if I send batch size 1 I get an image but if I change the batch size to 4 or any other number then I get the following exception: Exception calling application: output with shape [1, 77] doesn't match the broadcast shape [4, 77]

I get a similar exception on just using deepspeed with stable diffusion model on changing parameter values such as width, height and batch size. In direct deepspeed usage, I get this exception on changing parameters: The size of tensor a (4) must match the size of tensor b (2) at non-singleton dimension 0

Without being able to change parameters quickly, I am unable to deploy it as in practical scenario I am bound to use a variety of values for each of these parameters.

Looking for a solution on this.

mrwyattii commented 1 year ago

@gvijqb I believe you are seeing this problem when changing batch size because stable diffusion uses cuda graphs in the MII deployment (see https://github.com/microsoft/DeepSpeed-MII/blob/4040daecc4185e591d25cffc07b9cfd6de9f4fb7/mii/models/load_models.py#L75). DeepSpeed-inference captures the graph only once for a model, using the initially provided batch size. As a result, changing the batch size will give the error you are seeing. We are working on a solution that allows capturing multiple graph replays for different batch sizes.

gvijqb commented 1 year ago

Understood, thanks @mrwyattii . Is there any idea on when I can expect the solution to be live?

mrwyattii commented 1 year ago

@cmikeh2 started work on this some time ago: https://github.com/microsoft/DeepSpeed/pull/2458

I'll work on this and try to get it merged sometime next week!