Closed p-davidk closed 3 weeks ago
Hi @p-davidk 👋
Thanks for reporting this. I think we unfortunately wont be able to jump in to debug this now. If you find any more clues on what could be going on please feel free to update us here in the issue.
I'll also tag @drbh since he probably knows this part better than I do 👍
Support for local LORA adapters are not released in the TGI as of now. This is the Merge Request #2193 .
After this change will be released you should be able to use it as below as per description
LORA_ADAPTERS=predibase/dbpedia,myadapter=/path/to/dir/
or
--lora-adapters predibase/dbpedia,myadapter=/path/to/dir/
Similar Issue: Issue: #2143
Thanks for adding the context @imran3180 💪
I tried this today using sha-9263817
which is > 2.3.0; still didn't work. It said, Repository Not Found for url: https://huggingface.co/api/models/data/phi3-adapter.
huggingface-cli download microsoft/Phi-3-mini-4k-instruct --local-dir phi3
huggingface-cli download grounded-ai/phi3-hallucination-judge --local-dir phi3-adapter
model=/data/phi3
adapter=/data/phi3-adapter
volume=$PWD
docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:sha-9263817 --model-id $model --lora-adapters $adapter
Running on A100 80GB
To anyone arriving here looking for a solution, here is the proper way to use local lora adapters:
LORA_ADAPTERS=myadapter=/some/path/to/adapter,myadapter2=/another/path/to/adapter
curl 127.0.0.1:3000/generate \
-X POST \
-H 'Content-Type: application/json' \
-d '{
"inputs": "Hello who are you?",
"parameters": {
"max_new_tokens": 40,
"adapter_id": "myadapter"
}
}'
Information
Tasks
Reproduction
Error overview
I am using TGI 2.1.1 via a docker container. When I try to run with local LORA adapters, the model fails to load. I am launching with the following command:
Launch command
Error trace
When I do this, I see the following error trace:
Additional info
It appears that there are two errors here: 1) TGI is trying to load my local adapter from a repo, which fails 2) TGI thinks one of the models is Seq2Seq instead of CausalLM.
Issue (2) doesn't make sense because the configs of the LORAs and the original model all show
"task_type":"CAUSAL_LM"
. An example config from an adapter is below:All configs have this same format since they are from different checkpoints of the same finetuned model.
Expected behavior
I expect the script to launch a model endpoint at port 8080. I then expect to be able to switch between adapters with the "adapter" keyword argument in the
text-generation
python client.