[Bug] When using sglang as the inference framework, if a word starting with "\n" appears in the stop parameter, the sglang will Missing '\n' during inference

Checklist

[X] 1. I have searched related issues but cannot get the expected help.
[X] 2. The bug has not been fixed in the latest version.
[X] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

Describe the bug

When using sglang as the inference framework, if a word starting with "\n" appears in the stop parameter, the sglang will not wrap during inference。 EG: prompt = 请换行输出1-10个数字 stop = ['<|endoftext|>', '<|im_end|>', '<|im_start|>'] 1 2 3 4 5 6 7 8 9 10

prompt = 请换行输出1-10个数字 stop = ['\n<|endoftext|>', '<|im_end|>', '<|im_start|>'] 12345678910

"\n" can be followed by any character, and there will be no line break.

Reproduction

OS: Linux x64
GPU: A100 python：3.10 sglang：0.2.7 LLM model: Qwen2-72B-lora-awq-4bit cmd: python -m fastchat.serve.controller --host localhost --port 44000

python -m fastchat.serve.vllm_worker --model-path ${MODEL_PATH} --max-model-len 8192 --worker-address "http://0.0.0.0:22006" --port 22006 --model-names "qwen-latest" --controller-address "http://localhost:44000"

python -m fastchat.serve.openai_api_server --host 0.0.0.0 --port 21003 --controller-address "http://localhost:44000"

Then run code """ def test_open_ai(prompt: str, stream: bool = False, model: str = "qwen-latest"):

openai.api_key = "LTAI5t6C5QzrRfy5A4Ug4ujD"  # Not support yet
openai.api_base = "http://127.0.0.1:21003/v1"
completion = openai.ChatCompletion.create(
    model=model,
    messages=[{'role': 'user', 'content': prompt}],
    temperature=0.7,
    top_p=1.0,
    n=1,
    max_tokens=None,
    stream=False,
    presence_penalty=0.0,
    frequency_penalty=0.0,
    user=None,
    meta={},
    service="sas",
    scenario="Chat",
    stop_token_ids=[151643, 151644, 151645],
    stop=['\n<|endoftext|>', '<|im_end|>', '<|im_start|>'],
    max_new_tokens=8192,
)

if not stream:
    answer_md = completion.choices[0].message.content
    print(answer_md)
    return answer_md
else:
    pass

"""

Environment

Python: 3.10.14 (main, May  6 2024, 19:42:50) [GCC 11.2.0]
CUDA available: True
GPU 0,1,2,3: NVIDIA A100-SXM4-80GB
CUDA_HOME: /usr/local/cuda-12.2/
NVCC: Cuda compilation tools, release 12.2, V12.2.140
CUDA Driver Version: 535.183.01
535.183.01
535.183.01
535.183.01
PyTorch: 2.3.1+cu121
sglang: 0.2.7
flashinfer: 0.1.3
requests: 2.32.3
tqdm: 4.66.4
numpy: 1.26.4
aiohttp: 3.10.0
fastapi: 0.111.1
hf_transfer: 0.1.8
huggingface_hub: 0.23.4
interegular: 0.3.3
packaging: 24.1
PIL: 10.4.0
psutil: 6.0.0
pydantic: 2.8.2
uvicorn: 0.30.3
uvloop: 0.19.0
zmq: 26.0.3
vllm: 0.5.3.post1
openai: 1.37.1
anthropic: 0.32.0
NVIDIA Topology: 
    GPU0    GPU1    GPU2    GPU3    CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X  NV12    NV12    NV12    0-63    0       N/A
GPU1    NV12     X  NV12    NV12    0-63    0       N/A
GPU2    NV12    NV12     X  NV12    0-63    0       N/A
GPU3    NV12    NV12    NV12     X  0-63    0       N/A

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

ulimit soft: 65535

sgl-project / sglang