triton-inference-server / fastertransformer_backend

BSD 3-Clause "New" or "Revised" License
411 stars 133 forks source link

enable llama model in FT backend #146

Open shihy52x opened 1 year ago

shihy52x commented 1 year ago

existing FT backend will throw error for llama model.

sfc-gh-zhwang commented 1 year ago

Will this ever work? I didn't see llama defined under: https://github.com/NVIDIA/FasterTransformer/tree/main/src/fastertransformer/triton_backend