Speeding up loading in inference checkpoints

microsoft / DeepSpeed-MII

MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.

Apache License 2.0

1.89k stars 176 forks source link

Speeding up loading in inference checkpoints #426

Open amritap-ef opened 8 months ago

amritap-ef commented 8 months ago

Hi,

I saw this pull request in the DeepSpeed library about snapshotting an engine to be able to load in large models faster but I couldn't see any documentation on this: https://github.com/microsoft/DeepSpeed/pull/4664

How can I save and load in inference checkpoints on my own model faster with DeepSpeed-fastergen?

04/03/24: Updated issue description to be clearer

ZonePG commented 8 months ago

Hi, you can refer these docs and code examples:

for adding new unsupported models:

https://github.com/microsoft/DeepSpeed/blob/master/deepspeed/inference/v2/model_implementations/AddingAModel.md

for loading local huggingface checkpoints, you can specify the absolute directory path in pipeline

amritap-ef commented 8 months ago

Thanks for sending this through - apologies I didn't explain this very well.

What I actually was asking about is if there is a a way to reduce loading time for your own finetuned models from HuggingFace checkpoints, as I'm finding that loading in the default models seems to be much faster.

In particular, this PR https://github.com/microsoft/DeepSpeed/pull/4664 references adding the 'capability to snapshot an engine and resume from it' - hence I was wondering how I may save and load that engine so as to reduce the time taken to load a non-persistent pipeline the first time?