Virtual Office Hours: July 9 and July 25

mgoin commented 2 months ago

vLLM Virtual Open Office Hours

We enjoyed seeing everyone at the previous office hours and got great feedback. These office hours are a ~bi-weekly live event where you come to learn more about the vLLM project, how to contribute, and get help with your issues - with special topics and guests along the way.

Sign up here: https://neuralmagic.com/community-office-hours/ Here is a recording from June 20 so you can see the format: https://www.youtube.com/watch?v=ss02R8ndKnk

Dates:

July 9, 2024, at 2:00 PM EST (11:00 AM PST), Guest Topic: FP8 Quantization Deep Dive
July 25, 2024, at 2:00 PM EST (11:00 AM PST), Guest Topic: Model Compression for Fast and Efficient Inference

If there are any themes or topics you would like to see addressed, please comment below.

Previous issues:

w013nad commented 2 months ago

Can you explain the purpose behind pipeline parallelism? I've been testing the latest release and it doesn't seem to help performance at all even under high-throuput scenarios.

mgoin commented 1 month ago

@w013nad Right now PP isn't optimized for performance, just for memory usage. Next steps are to improve performance to see some throughput gains!

Recording and slides for the July 9th FP8 Office Hours are here: https://www.youtube.com/watch?v=GLqsETc8aTc

vllm-project / vllm

Virtual Office Hours: July 9 and July 25 #5937

vLLM Virtual Open Office Hours