pytorch / xla

Enabling PyTorch on XLA Devices (e.g. Google TPU)
https://pytorch.org/xla
Other
2.49k stars 483 forks source link

Remove unnecessary prints in PagedAttention #8374

Closed WoosukKwon closed 1 week ago

WoosukKwon commented 1 week ago

Currently, the log is printed whenever the paged attention op is compiled (for every layer), which is not needed for end users.

@vanbasten23 Let me know if there's a better way to log this.

vanbasten23 commented 1 week ago

hey @WoosukKwon , could you point me to your vLLM and the new paged attention integration PR in vLLM?