triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8.38k stars 1.49k forks source link

Unrecognized configuration class to build an AutoTokenizer for microsoft/Florence-2-base-ft #7726

Closed shihao28 closed 3 weeks ago

shihao28 commented 1 month ago

Description I was trying to host https://huggingface.co/microsoft/Florence-2-base-ft using Triton's Python-based vLLM backend and encounter an error (Unrecognized configuration class <class 'transformers_modules.microsoft.Florence-2-base.ee1f1f163f352801f3b7af6b2b96e4baaa6ff2ff.configuration_florence2.Florence2Config'> to build an AutoTokenizer.)

Triton Information Images: nvcr.io/nvidia/tritonserver:24.09-vllm-python-py3 I pulled the images from the Nvidia image repository and use it as it is

To Reproduce

  1. Set up a model repo directory: ~/work/model_repository/florence-2-base-ft/1
  2. Downloaded model.json and config.pbtxt as suggested in the tutorial

instance_group [ { count: 1 kind: KIND_MODEL } ]


3. Run trition inference server

- cd ~/work
- docker run --gpus all -it --net=host --rm -p 8001:8001 --shm-size=1G --ulimit memlock=-1 --ulimit stack=67108864 -v ./:/models -w /work nvcr.io/nvidia/tritonserver:24.09-vllm-python-py3 tritonserver --model-store /models

**Error**
![image](https://github.com/user-attachments/assets/e753281d-2f44-4ffa-be13-3b7420e2283e)
rmccorm4 commented 3 weeks ago

Hi @shihao28, this looks like a lack of model support in vLLM itself. Please see this issue: https://github.com/vllm-project/vllm/issues/5934.

Feel free to re-open if this is incorrect.