Closed Gintasz closed 2 months ago
I've avoided the problem by replacing the YAML format for output generation with XML format.
r"<array>\n(?:<string>.*?<\/string>\n)*<\/array>```"
I had the same problem with llama3 refusing to stop despite using the llama3-instruct template "<|eot_id|>" appropriate stop string. I added "assistant" as a stop string in the call to sgl.gen and this seemed to have abated the issue entirely. Can you give that a try with your YAML regex?
I had the same problem with llama3 refusing to stop despite using the llama3-instruct template "<|eot_id|>" appropriate stop string. I added "assistant" as a stop string in the call to sgl.gen and this seemed to have abated the issue entirely. Can you give that a try with your YAML regex?
I've heard that you need to set global_config.skip_special_tokens_in_output
to False
in sglang.global_config
. Then "<|eot_id|>"
will start to be effective.
This issue has been automatically closed due to inactivity. Please feel free to reopen it if needed.
I had the same problem with llama3 refusing to stop despite using the llama3-instruct template "<|eot_id|>" appropriate stop string. I added "assistant" as a stop string in the call to sgl.gen and this seemed to have abated the issue entirely. Can you give that a try with your YAML regex?
I've heard that you need to set
global_config.skip_special_tokens_in_output
toFalse
insglang.global_config
. Then"<|eot_id|>"
will start to be effective.
faced the same issue, tried to set global_config.skip_special_tokens_in_output
to False
, nothing changed
Hey, I've just been trying to catch this bug for half a day...
I've done
pip install git+https://github.com/sgl-project/sglang.git@51104cd#subdirectory=python
, which is the commit where 0.1.14 was mentioned.Launched server like this:
When the script below is launched, the server will get stuck in an infinite generation loop, which is long beyond the specified
max_tokens=1024
. Then it will crash. In my app there was some CUDA device assertion error (although same problem), however, in the reproduced example below the error isRecursionError: maximum recursion depth exceeded while calling a Python object
. This is the log of server: logfile.txtIf
regex
is removed, then there is no problem, the generation will stop when the token limit is exceeded.If I change the model to
mistralai/Mistral-7B-Instruct-v0.2
, then there appears no such issue.Other than that,
meta-llama/Meta-Llama-3-8B-Instruct
does work with other prompts using the same regex.