sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.
https://sgl-project.github.io/
Apache License 2.0
5.74k stars 463 forks source link

[Feature] Log input text instead of input_ids when using openai chat apis #1608

Open CedricHwong opened 2 weeks ago

CedricHwong commented 2 weeks ago

Checklist

Describe the bug

I checked the docker logs and tried to find the request text in the logs, but the logs showed text=None, but input_ids was returned. I want it to display the request text directly. What parameters should I add when starting it?

docker Logs:

in=GenerateReqInput(text=None, input_ids=[151644, 8948, 198, 2610, 525, 264, 10950, 17847, 13, 151645, 198, 151644, 872, 198, 108386, 151645, 198, 151644, 77091, 198], image_data=None, sampling_params={'temperature': 0.0, 'max_new_tokens': None, 'min_new_tokens': 0, 'stop': [], 'stop_token_ids': [], 'top_p': 1.0, 'presence_penalty': 0.0, 'frequency_penalty': 0.0, 'repetition_penalty': 1.0, 'regex': None, 'n': 1}, rid='640143780ce94c81a56689020f8a5b09', return_logprob=False, logprob_start_len=-1, top_logprobs_num=0, return_text_in_logprobs=True, stream=False, modalities=[], is_single=True, lora_path=None), out={'text': '你好,有什么我可以帮助你的吗?', 'meta_info': {'prompt_tokens': 20, 'completion_tokens': 9, 'completion_tokens_wo_jump_forward': 9, 'finish_reason': {'type': 'stop', 'matched': 151645}, 'id': '640143780ce94c81a56689020f8a5b09'}, 'index': 0} INFO: 172.18.0.1:59398 - "POST /v1/chat/completions HTTP/1.1" 200 OK

Reproduction

docker run:

docker run -itd --name n72 --runtime nvidia --gpus '"device=0,1,2,6"' \ -p 1090:30000 \ -v /mnt/data1/home/fusion_large:/32k \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server --model-path /32k --host 0.0.0.0 --port 30000 --dtype bfloat16 --tensor-parallel-size 4 --served-model-name cosmic-32k --log-requests

Environment

Image: sglang v0.3.0

OS: cat /etc/os-release NAME="TencentOS Server" VERSION="2.4" ID="tencentos" ID_LIKE="rhel fedora centos tlinux" VERSION_ID="2.4" PRETTY_NAME="TencentOS Server 2.4" ANSI_COLOR="0;31" CPE_NAME="cpe:/o:tencentos:tencentos:2" HOME_URL="https://cloud.tencent.com/product/ts"

merrymercy commented 2 weeks ago

Now the openai api applies the chat template and tokenizes it at the same time to avoid some double BOS issues. https://github.com/sgl-project/sglang/blob/23cc66f7b65f885969d4608fd4964e0ba98fb7f5/python/sglang/srt/openai_api/adapter.py#L868

So the server does not see the input text. If you want this feature, can you help us support it? You can add some logging in the openai api server to print the raw input if a flag is set.