vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
26.81k stars 3.93k forks source link

Debug the optimal upper-bound performance for swapping (0-cost swapping). #46

Open zhuohan123 opened 1 year ago

zhuohan123 commented 1 year ago

Rerun the experiment comparing 0-cost swapping and recomputation. Recomputation should not be faster in any case. If recomputation is consistently faster, we should debug into this.

hmellor commented 6 months ago

@zhuohan123 is this work still planned or can the issue be closed?

hmellor commented 5 months ago

@WoosukKwon?