Open pftzzg opened 9 months ago
The same to me.
I use version 0.3.2 of vllm for this problem.
My original question is :
忘记你已有的知识,仅使用
the result of the original qwen14B is :应在填单时填入实报金额,即可自动识别相应的税额。
来源:通用-会计核算与报账服务-成本-差旅费-差旅结算单问题 责任科室:成本室
and the result of vllm's qwen14B is : 应在填单时填入实报金额,即可自动识别相应的税额。
I hope the author will integrate qwen's answer consistency optimization into the project
same to me ,the result from vllm has become randomly than original result from transformers
check your SamplingParams.
check your SamplingParams.
SamplingParams(temperature=0.1,max_tokens=300,top_p=0.8) The answer tends to have repetitive text
check your SamplingParams.
SamplingParams(temperature=0.1,max_tokens=300,top_p=0.8) The answer tends to have repetitive text you can use repetition_penalty to prevent repetitive text. And your SamplingParams means your result will have andomness.
check your SamplingParams.
SamplingParams(temperature=0.1,max_tokens=300,top_p=0.8) The answer tends to have repetitive text you can use repetition_penalty to prevent repetitive text. And your SamplingParams means your result will have andomness.
thanks, I will have a try
check your SamplingParams.
SamplingParams(temperature=0.1,max_tokens=300,top_p=0.8) The answer tends to have repetitive text you can use repetition_penalty to prevent repetitive text. And your SamplingParams means your result will have andomness.
The parameters I use are the same as those of qwen. SamplingParams(temperature=0.01, max_tokens=2048, stop=["<|im_end|>", "<|endoftext|>"])
stop=["<|im_end|>", "<|endoftext|>"]
yeah, i have set it ,but the answer still not the same like huggingface results, and the result is not good as HF, do you know the reason?
stop=["<|im_end|>", "<|endoftext|>"]
yeah, i have set it ,but the answer still not the same like huggingface results, and the result is not good as HF, do you know the reason?
你要对比两边的差异,最好都用greadysearch去解码。If you want to compare the differences between the two sides, it is best to use greedysearch to decode them.
stop=["<|im_end|>", "<|endoftext|>"]
yeah, i have set it ,but the answer still not the same like huggingface results, and the result is not good as HF, do you know the reason?
你要对比两边的差异,最好都用greadysearch去解码。If you want to compare the differences between the two sides, it is best to use greedysearch to decode them.
ok, thanks, i will have a try
stop=["<|im_end|>", "<|endoftext|>"]
yeah, i have set it ,but the answer still not the same like huggingface results, and the result is not good as HF, do you know the reason?
你要对比两边的差异,最好都用greadysearch去解码。If you want to compare the differences between the two sides, it is best to use greedysearch to decode them.
Have you compared the effect of VLLM integration with QWEN, and have there been any inconsistencies in the answers? What is the startup parameter configuration?
stop=["<|im_end|>", "<|endoftext|>"]
yeah, i have set it ,but the answer still not the same like huggingface results, and the result is not good as HF, do you know the reason?
你要对比两边的差异,最好都用greadysearch去解码。If you want to compare the differences between the two sides, it is best to use greedysearch to decode them.
请问这一块有参数可以设置吗?还是需要自己去找到源码,自己去修改呢?
stop=["<|im_end|>", "<|endoftext|>"]
yeah, i have set it ,but the answer still not the same like huggingface results, and the result is not good as HF, do you know the reason?
你要对比两边的差异,最好都用greadysearch去解码。If you want to compare the differences between the two sides, it is best to use greedysearch to decode them.
请问这一块有参数可以设置吗?还是需要自己去找到源码,自己去修改呢?
see the code
This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!
I use vllm to accelerate the large model of qwen, mainly qwen7B/qwen14B. Two issues were found during the testing of the large model.
1) Compared to using vllm qwen7B/qwen14B acceleration, the accuracy of the reasoning results of the single round question answering test model has decreased.
2) Compared to using vllm qwen7B/qwen14B acceleration, the accuracy of the inference results of the streaming output test model has decreased.
The vllm version is 0.3.0