Update Triton inference server Docker deployment for Falcon 40B

h2oai / h2ogpt

Private chat with local GPT with document, images, video, etc. 100% private, Apache 2.0. Supports oLLaMa, Mixtral, llama.cpp, and more. Demo: https://gpt.h2o.ai/ https://gpt-docs.h2o.ai/

http://h2o.ai

Apache License 2.0

11.2k stars 1.23k forks source link

Update Triton inference server Docker deployment for Falcon 40B #223

Open arnocandel opened 1 year ago

arnocandel commented 1 year ago

https://github.com/h2oai/h2ogpt/blob/main/docs/TRITON.md

do same for Falcon 7B, then Falcon 40B

HarperGrieve commented 1 year ago

Is this possible yet?

mokpolar commented 1 year ago

waiting

pseudotensor commented 1 year ago

If you are looking for a fast inference docker engine, I'd recommend hugging faces text generation inference server instead. It's continuous batching makes it far superior to nvidia triton.

We provide documentation related to this here: https://github.com/h2oai/h2ogpt/blob/main/docs/README_InferenceServers.md#docker-install

mokpolar commented 1 year ago

@pseudotensor Thanks, I'll try to test what you said