predibase / lorax

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
https://loraexchange.ai
Apache License 2.0
2.18k stars 143 forks source link

When caching adapters, cache the adapter ID + the API token pair #479

Open noyoshi opened 5 months ago

noyoshi commented 5 months ago

Feature request

When we cache adapters, we should cache the adapter ID + the API token pair. Even if the adapter is already on GPU memory, we should ensure that the caller has access to it by maintaining a cache of adapter ID + api token pairs.

Motivation

Otherwise, we could get situations where one users calls prompt w/ a private HF hub adapter and HF key, it works and is cached, then another user could call prompt w/ the same adapter without setting a HF api token in the request. Since the adapter is cached, the request works.

Your contribution

I can try to implement it, but I am quite busy so not sure when I can get to it.

safimuhammad commented 5 months ago

hello, I would like to work on this.

magdyksaleh commented 5 months ago

Hey @safimuhammad - wanna chat on discord for next steps on this?

safimuhammad commented 5 months ago

@magdyksaleh Sure, here's my discord user name msafi38

safimuhammad commented 5 months ago

hey @magdyksaleh , reaching you out on discord, lets discuss next steps on this.