Closed bethalianovike closed 2 months ago
Thank you @bethalianovike for reporting. Though the interface supports passing in multiple additional models, we only support one additional model for spec decoding right now. We will update the documentation to avoid this confusion.
Updated docs in #2841. Multiple additional models is planned as a future feature to support.
π Bug
β General Questions
Based on https://llm.mlc.ai/docs/deploy/rest.html#id5, we can use more than 1 additional models as we use speculative decoding mode. But when get response via rest API post, I get the following error message.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Generate the response.
Environment
conda
, source): sourcepip
, source): sourcepython -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))"
, applicable if you compile models):Additional context