microsoft / vattention

Dynamic Memory Management for Serving LLMs without PagedAttention
MIT License
248 stars 16 forks source link

microbenchmarks/perf_pagesize/bench_pagesize.py #15

Open alvi75 opened 3 months ago

alvi75 commented 3 months ago
u64 do_cuda_uvm_init(int, u64): Assertion `page_size == 64*KB || page_size == 128*KB || page_size == 256*KB' failed.
Aborted (core dumped)
apanwariisc commented 3 months ago

what value are you passing for model_block_size?

alvi75 commented 3 months ago

sorry for late response, I just created new issu with detail explanation