Open komninoschatzipapas opened 3 months ago
EDIT: Fixed error logs
I'm getting the same error with google/paligemma-3b-ft-docvqa-896
. I'm trying to deploy the model with an adapter on GKE.
Here is the command line I used : text-generation-launcher --port 8000 --hostname 0.0.0.0 --model-id google/paligemma-3b-ft-docvqa-896 --lora-adapters ArkeaIAF/paligemma-3b-table2html-lora
System Info
1xL40 node on Runpod Latest
huggingface/text-generation-inference:latest
docker image. Command:--model-id HuggingFaceM4/idefics2-8b --port 8080 --max-input-length 3000 --max-total-tokens 4096 --max-batch-prefill-tokens 4096 --speculate 3 --lora-adapters orionsoftware/rater-adapter-v0.0.1
Information
Tasks
Reproduction
I'm trying to deploy an idefics2 LoRA using the
huggingface/text-generation-inference:latest
docker image.The command I'm running is
--model-id HuggingFaceM4/idefics2-8b --port 8080 --max-input-length 3000 --max-total-tokens 4096 --max-batch-prefill-tokens 4096 --speculate 3 --lora-adapters orionsoftware/rater-adapter-v0.0.1
I also have a correct HF token to access orionsoftware/rater-adapter-v0.0.1.
It works well without the
--lora-adapters orionsoftware/rater-adapter-v0.0.1
part. But once I add the LoRA, I'm getting this error starting up:This is on a 1xL40 node on Runpod.
orionsoftware/rater-adapter-v0.0.1
was trained using thetransformers
Trainer
and looks like this:I'm curious as to what I'm doing wrong. Unfortunately, my weak Python skills prevent me from debugging this further.
Expected behavior
The expectation was for the model to be served correctly with no errors.