document for huggingface(vllm) servingruntime for multi-node

kserve / website

User documentation for KServe.

Apache License 2.0

103 stars 126 forks source link

"Fixes #issue-number" or "Add description of the problem this PR solves"

Proposed Changes

This PR add a new documentation for setting up multi-node/multi-GPU inference using the Hugging Face LLM Serving Runtime. It includes detailed instructions on prerequisites, key configurations, model inference, and sample requests for OpenAI completions and chat endpoints. This documentation aims to enhance user understanding and streamline the deployment process, ensuring a smooth experience for developers looking to leverage Hugging Face's capabilities in a Kubernetes environment

This documentation is valid only after https://github.com/kserve/kserve/pull/3972 is merged.

Name	Link
Latest commit	6e4a702c4acb71a6d3bd51bfed85509079a2eceb
Latest deploy log	https://app.netlify.com/sites/elastic-nobel-0aef7a/deploys/673763bf4e93b30008e88912
Deploy Preview	https://deploy-preview-402--elastic-nobel-0aef7a.netlify.app
Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

Name

Link

Latest commit

6e4a702c4acb71a6d3bd51bfed85509079a2eceb

Latest deploy log

https://app.netlify.com/sites/elastic-nobel-0aef7a/deploys/673763bf4e93b30008e88912

Deploy Preview

https://deploy-preview-402--elastic-nobel-0aef7a.netlify.app

Preview on mobile

Toggle QR Code...

Use your smartphone camera to open QR code link.

kserve / website