vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
29.8k stars 4.5k forks source link

[Bug]: When apply continue_final_message for OpenAI server, the `"echo":false` is ignored. #10111

Open DIYer22 opened 1 day ago

DIYer22 commented 1 day ago

Your current environment

vLLM Version: 0.6.3.post2.dev256+g4be3a451

The output of `python collect_env.py` ```text Collecting environment information... INFO 11-06 09:39:21 importing.py:15] Triton not installed or not compatible; certain GPU-related functions will not be available. PyTorch version: 2.4.0+cpu Is debug build: False CUDA used to build PyTorch: None ROCM used to build PyTorch: N/A OS: Ubuntu 22.04.4 LTS (x86_64) GCC version: (Ubuntu 12.3.0-1ubuntu1~22.04) 12.3.0 Clang version: Could not collect CMake version: version 3.30.5 Libc version: glibc-2.35 Python version: 3.10.12 (main, Jul 29 2024, 16:56:48) [GCC 11.4.0] (64-bit runtime) Python platform: Linux-5.10.25-nvidia-gpu-x86_64-with-glibc2.35 Is CUDA available: False CUDA runtime version: No CUDA CUDA_MODULE_LOADING set to: N/A GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA HIP runtime version: N/A MIOpen runtime version: N/A CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 46 bits physical, 57 bits virtual Byte Order: Little Endian CPU(s): 17 On-line CPU(s) list: 0-16 Vendor ID: GenuineIntel Model name: Intel(R) Xeon(R) Gold 6330 CPU @ 2.00GHz CPU family: 6 Model: 106 Thread(s) per core: 1 Core(s) per socket: 1 Socket(s): 17 Stepping: 6 BogoMIPS: 4000.00 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves wbnoinvd arat avx512vbmi umip pku avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid md_clear arch_capabilities Hypervisor vendor: KVM Virtualization type: full L1d cache: 544 KiB (17 instances) L1i cache: 544 KiB (17 instances) L2 cache: 68 MiB (17 instances) L3 cache: 272 MiB (17 instances) Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; Enhanced IBRS, IBPB conditional, RSB filling Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Not affected Versions of relevant libraries: [pip3] intel_extension_for_pytorch==2.4.0 [pip3] numpy==1.26.4 [pip3] pyzmq==26.2.0 [pip3] torch==2.4.0+cpu [pip3] torchvision==0.19.0+cpu [pip3] transformers==4.46.2 [conda] Could not collect ROCM Version: Could not collect Neuron SDK Version: N/A vLLM Version: 0.6.3.post2.dev256+g4be3a451 vLLM Build Flags: CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled GPU Topology: Could not collect ```

Model Input Dumps

No response

🐛 Describe the bug

According to the documentation, the echo parameter is false by default, but with continue_final_message set, the new message will always be prepended with the last message.

Reproduction Code:

curl -X POST "http://39.105.21.95:12481/v1/chat/completions" \
     -H "Content-Type: application/json" \
     -d '{
           "model": "meta-llama/Meta-Llama-3-8B-Instruct",
           "messages": [
             {
               "role": "user",
               "content": "tell me a common saying"
             },
             {
               "role": "assistant",
               "content": "Here is a common saying about apple. An apple a day, keeps"
             }
           ],
           "add_generation_prompt": false, "continue_final_message": true, "echo":false      
         }'

{"id":"chatcmpl-c49a327f6edd48ed9993668771d5589f","object":"chat.completion","created":1730963591,"model":"meta-llama/Meta-Llama-3-8B-Instruct","choices":[{"index":0,"message":{"role":"assistant","content":"Here is a common saying about apple. An apple a day, keeps the doctor away!","tool_calls":[]},"logprobs":null,"finish_reason":"stop","stop_reason":null}],"usage":{"prompt_tokens":29,"total_tokens":34,"completion_tokens":5},"prompt_logprobs":null}% 

Before submitting a new issue...

chaunceyjiang commented 17 hours ago

@DarkLight1337 Hi, Please assign to me.