predibase / lorax

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
https://loraexchange.ai
Apache License 2.0
2.13k stars 139 forks source link

Expose env to set base path for local adpaters #346

Open bjornjee opened 6 months ago

bjornjee commented 6 months ago

Feature request/question

Expose ENV/flag in lorax-server and lorax-launcher to set base path of adapter during inference.

We currently tried to do a workaround by setting HUGGINGFACE_HUB_CACHE=/home/adapters . With reference to: https://github.com/predibase/lorax/blob/main/server/lorax_server/utils/sources/local.py#L26.

However, since we only save adapter weights to .bin file extension, we are getting the error during inference:

request:

{
    "inputs": "[INST] <some_input> [/INST]",
    "parameters": {
        "max_new_tokens":64,
        "adapter_id": "adapter-1-lora"
    }
}

error:

{
    "error": "Request failed during generation: Server error: No local weights found in AP-2-lora with extension .safetensors",
    "error_type": "generation"
}

However, if we use the absolute path, we are able to get a response request:

{
    "inputs": "[INST] <some_input> [/INST]",
    "parameters": {
        "max_new_tokens":64,
        "adapter_id": "/home/adapters/adapter-1-lora"
    }
}

Motivation

abstract away absolute path of adapters from users during inference with custom adapters which are downloaded locally into instance.

Your contribution

possible to prepare a PR.

tgaddair commented 6 months ago

Thanks for raising this issue @bjornjee! This sounds like a good improvement to me. Since you mentioned you would be open to submitting a PR, is this something you'd like to contribute?