Open Hspix opened 1 year ago
I also have the same problem. Have you solved it?
同样的2048限制
This issue is submitted to vllm project/vllm, and NTK support is currently being debugged. The default implementation supports the basic encoding length of 2048. You can also use 'model_worker' for loading first
Can use flash-attn under the model_worker?
只需要改一下默认就好了 问题可以关闭了吧 代码没有错误@Trangle
只需要改一下默认就好了 问题可以关闭了吧 代码没有错误@Trangle
请问是哪里的默认呢?
I also have the same problem. Have you solved it?
When I am using vicuna-7b-v1.5 model under FastChat and vLLM, I got 4096 token prompt limitation.
I think this issue related to this question. How to solve this problem?
只需要改一下默认就好了 问题可以关闭了吧 代码没有错误@Trangle
请问是哪里的默认呢?
Same question.
Bug Description
Integrated with langchain, Qwen-7B-Chat model is deployed under FastChat and vLLM, which OpenAI API is employed. When the number of input tokens is more than 2048, it raise
However, it shouldn't happend when
use_dynamic_ntk
anduse_logn_attn
is set totrue
in config.json file of model.Steps to Reproduce
Packages
Code piece