Open mgoin opened 3 weeks ago
Hi, I'm highly interested in running vLLM on Google cloud TPU VMs, especially like TPUv4 pod slices like TPUv4-32. is there any plan this office hour uploaded to youtube? :)
Sorry just saw it know , I had a question about the performance benchmark on TPUs, like what are the best practices does quantization leads to more throughput.
vLLM Virtual Open Office Hours
We enjoyed seeing everyone at the previous office hours and got great feedback. These office hours are a virtual bi-weekly live event where you come to learn more about the vLLM project, how to contribute, and get help with your issues - with special topics and guests along the way.
Sign up here: https://neuralmagic.com/community-office-hours/ You can watch previous sessions on the YouTube playlist.
Dates:
August 8, 2024, at 2:00 PM EST (11:00 AM PST), Guest Topic: Multi-Modal Models in vLLM, by Roger Wang from Roblox
August 21, 2024, at 2:00 PM EST (11:00 AM PST), Guest Topic: vLLM on AMD GPUs and Google TPUs, by Woosuk Kwon from UC Berkeley
If there are any themes or topics you would like to see addressed, please comment below. We look forward to seeing you there!
Previous office hour issue: https://github.com/vllm-project/vllm/issues/5937