vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
26.5k stars 3.88k forks source link

Release v0.5.5 #7481

Closed simon-mo closed 2 weeks ago

simon-mo commented 4 weeks ago

We will make a release later this week or early next week (Aug 16-Aug19) to address Gemma logits soft-caps bug, openai server metrics bug, and include more performance enhancements.

Please add blockers if needed.

robertgshaw2-neuralmagic commented 4 weeks ago

Will focus on getting these over the line tomorrow and thursday:

njhill commented 4 weeks ago
simon-mo commented 4 weeks ago
robertgshaw2-neuralmagic commented 4 weeks ago
simon-mo commented 3 weeks ago
mgoin commented 3 weeks ago
WoosukKwon commented 3 weeks ago
youkaichao commented 3 weeks ago
njhill commented 3 weeks ago
robertgshaw2-neuralmagic commented 3 weeks ago

Not required but nice + easy:

njhill commented 3 weeks ago
sindhuvahinis commented 3 weeks ago
Jimmy-Newtron commented 2 weeks ago

... a release later this week or early next week (Aug 16-Aug19) ...

When do you plan the release now?

simon-mo commented 2 weeks ago

Now