predibase / lorax

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
https://loraexchange.ai
Apache License 2.0
2.07k stars 138 forks source link

Skip the download process from `lorax-launcher` if model weights already on local disk #180

Open tgaddair opened 8 months ago

tgaddair commented 8 months ago

Download process here will be essentially a no-op if the model weights are already present, but this can add several seconds of latency to startup.

We can make a quick check from within the lorax-launcher to see if the model weights exist and if so, skip this call entirely.

micholeodon commented 2 months ago

If I may add some suggestion, please take care to make HUGGING_FACE_HUB_TOKEN variable optional, because even if I intend to serve local model only, LoRAX demands this variable ... Btw I use the model that is on HF gated repo.