vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
30.51k stars 4.62k forks source link

[Feature]: vTensor: Flexible Virtual Tensor Management for Efficient LLM Serving #6687

Open MichoChan opened 3 months ago

MichoChan commented 3 months ago

🚀 The feature, motivation and pitch

vTensor: Flexible Virtual Tensor Management for Efficient LLM Serving look so cool

Alternatives

No response

Additional context

No response

github-actions[bot] commented 1 week ago

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!