How to improve model loading time

blistick commented 6 months ago

This is a great project! Thank you for your hard work!

My cloud-based GPU instance takes about 13.6 seconds to load the model, using IPAdapterFull(). The data is located on a network drive within the same data center as the server, so access speed to the checkpoint and encoder should be fairly fast.

I'm trying to implement an on-demand endpoint, where the instance is instantiated only when needed, and then the models are loaded and inference can take place. Loading the Stable Diffusion model (Realistic Vision) plus the IP Adapter takes a lot of time in total, which makes for a big delay to the end-user.

Any tips would be greatly appreciated!

levi commented 6 months ago

Store the model in the container image, not on the network storage. Look into keeping the model loaded into memory and reusing it for subsequent requests, while the container is still warm. Unfortunately, cold start time will be a constant factor until serializing GPU memory is an available solution.

blistick commented 6 months ago

@levi Thank you very much for this advice. It's funny, because the server company I'm using (RunPod) recommended using their network volume for models. I'll try putting them in the container instead.

levi commented 6 months ago

Use their network file system for user generated models, but for base models like SD and IP Adapter, store them in the image because that will always be the fastest.

tencent-ailab / IP-Adapter

How to improve model loading time #203