Closed lwang2070 closed 2 months ago
If you are using vLLM, you can increase max_position_embeddings
to 131072
in config.json
. By default, I remember vLLM will ignore the request if input sequence length > max_position_embeddings
.
Ahh I see! I'll close issue after confirming the fix:)
Dear author,
Thanks a bunch for the invaluable benchmark you created! I have used the benchmark to evaluate Qwen2-72B-Instruct-131k model, but noticed a significant different between the result I obtained and the value listed in
README.md
. More precisely, I obtained zero score for Qwen2 series models on all tasks exceeding 32k (their training length), with Yarn scaling enabled as suggested on their model card page. Here are all my configs:config.json
:config_models.sh
:template.py
:Any suggestions for possible mistakes I have made? Thanks in advance!