vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
26.03k stars 3.81k forks source link

Virtual Office Hours: July 9 and July 25 #5937

Open mgoin opened 2 months ago

mgoin commented 2 months ago

vLLM Virtual Open Office Hours

We enjoyed seeing everyone at the previous office hours and got great feedback. These office hours are a ~bi-weekly live event where you come to learn more about the vLLM project, how to contribute, and get help with your issues - with special topics and guests along the way.

Sign up here: https://neuralmagic.com/community-office-hours/ Here is a recording from June 20 so you can see the format: https://www.youtube.com/watch?v=ss02R8ndKnk

Dates:

If there are any themes or topics you would like to see addressed, please comment below.

Previous issues:

w013nad commented 2 months ago

Can you explain the purpose behind pipeline parallelism? I've been testing the latest release and it doesn't seem to help performance at all even under high-throuput scenarios.

mgoin commented 1 month ago

@w013nad Right now PP isn't optimized for performance, just for memory usage. Next steps are to improve performance to see some throughput gains!

Recording and slides for the July 9th FP8 Office Hours are here: https://www.youtube.com/watch?v=GLqsETc8aTc