triton-inference-server / tensorrtllm_backend

The Triton TensorRT-LLM Backend
Apache License 2.0
581 stars 81 forks source link

[request] Add example of custom LLM model not based on huggingface #465

Closed michaelnny closed 1 month ago

michaelnny commented 1 month ago

Hi,

I'm wondering if it's possible to add example (or general guideline) of how to serving custom LLM model that's not based on huggingface.

As an example, we could use the original Llama3 chat model with the native Tiktoken tokenizer, which are not based on huggingface transformers: https://github.com/meta-llama/llama3

This will be great for people that are working with custom LLM models that are decoupled from the huggingface ecosystem, thanks!