Closed neubig closed 6 months ago
Hi @neubig! This is caused by a recent update on TVM side (here's a same error in this thread https://github.com/mlc-ai/mlc-llm/issues/2386). Ideally you can resolve that by updating to the latest TVM via python -m pip install --pre -U -f https://mlc.ai/wheels mlc-ai-nightly-cu122
.
OK, great thanks! I haven't had a chance to test yet, but I trust that this has been fixed. I'll reopen if not.
🐛 Bug
When serving a model through the REST API on an 8xA6000 machine I get this error: The block is 1-time referenced by other blocks, thus cannot accept new KV values.
I've added the relevant details below.
To Reproduce
Steps to reproduce the behavior on a machine with 8 A6000 GPUs:
Then hit it with a relatively long context, replace "..." with something with a reasonable number of tokens:
Here is the full stack trace.
Expected behavior
Environment
conda
, source): pippip
, source): pippython -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))"
, applicable if you compile models):