Open Jooho opened 1 month ago
Name | Link |
---|---|
Latest commit | 6e4a702c4acb71a6d3bd51bfed85509079a2eceb |
Latest deploy log | https://app.netlify.com/sites/elastic-nobel-0aef7a/deploys/673763bf4e93b30008e88912 |
Deploy Preview | https://deploy-preview-402--elastic-nobel-0aef7a.netlify.app |
Preview on mobile | Toggle QR Code...Use your smartphone camera to open QR code link. |
To edit notification comments on pull requests, go to your Netlify site configuration.
"Fixes #issue-number" or "Add description of the problem this PR solves"
Proposed Changes
This PR add a new documentation for setting up multi-node/multi-GPU inference using the Hugging Face LLM Serving Runtime. It includes detailed instructions on prerequisites, key configurations, model inference, and sample requests for OpenAI completions and chat endpoints. This documentation aims to enhance user understanding and streamline the deployment process, ensuring a smooth experience for developers looking to leverage Hugging Face's capabilities in a Kubernetes environment
This documentation is valid only after https://github.com/kserve/kserve/pull/3972 is merged.