microsoft / DeepSpeed-MII

MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
Apache License 2.0
1.91k stars 175 forks source link

LoRA Support #527

Open bagelbig opened 2 months ago

bagelbig commented 2 months ago

I am attempting to use DeepSpeed-MII for inference. I am presently using the pipeline approach. This does not seem to support LoRA.

Is there a way I can use DeepSpeed-FastGen with LoRA?

I am hesitant to return to 'DeepSpeed-Inference', since the top of the documentation clearly states: https://www.deepspeed.ai/tutorials/inference-tutorial/

DeepSpeed-Inference v2 is here and it’s called DeepSpeed-FastGen! For the best performance, latest features, and newest model support ...

This makes me concerned that using 'DeepSpeed-Inference' will be phased out and no longer supported.

Please advice how I can move forward with using a model and a LoRA at the same time (without pre-merging).

Thank you.