I'm wondering if it's possible to add example (or general guideline) of how to serving custom LLM model that's not based on huggingface.
As an example, we could use the original Llama3 chat model with the native Tiktoken tokenizer, which are not based on huggingface transformers:
https://github.com/meta-llama/llama3
This will be great for people that are working with custom LLM models that are decoupled from the huggingface ecosystem, thanks!
Hi,
I'm wondering if it's possible to add example (or general guideline) of how to serving custom LLM model that's not based on huggingface.
As an example, we could use the original Llama3 chat model with the native Tiktoken tokenizer, which are not based on huggingface transformers: https://github.com/meta-llama/llama3
This will be great for people that are working with custom LLM models that are decoupled from the huggingface ecosystem, thanks!