kserve / website

User documentation for KServe.
https://kserve.github.io/website/
Apache License 2.0
103 stars 126 forks source link

document for huggingface(vllm) servingruntime for multi-node #402

Open Jooho opened 1 month ago

Jooho commented 1 month ago

"Fixes #issue-number" or "Add description of the problem this PR solves"

Proposed Changes

This PR add a new documentation for setting up multi-node/multi-GPU inference using the Hugging Face LLM Serving Runtime. It includes detailed instructions on prerequisites, key configurations, model inference, and sample requests for OpenAI completions and chat endpoints. This documentation aims to enhance user understanding and streamline the deployment process, ensuring a smooth experience for developers looking to leverage Hugging Face's capabilities in a Kubernetes environment

This documentation is valid only after https://github.com/kserve/kserve/pull/3972 is merged.

netlify[bot] commented 1 month ago

Deploy Preview for elastic-nobel-0aef7a ready!

Name Link
Latest commit 6e4a702c4acb71a6d3bd51bfed85509079a2eceb
Latest deploy log https://app.netlify.com/sites/elastic-nobel-0aef7a/deploys/673763bf4e93b30008e88912
Deploy Preview https://deploy-preview-402--elastic-nobel-0aef7a.netlify.app
Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.