vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
31.67k stars 4.81k forks source link

garbled response from 70B llama-2 #1327

Closed pseudotensor closed 1 year ago

pseudotensor commented 1 year ago

Image:

Screenshot 2023-10-11 at 2 12 09 PM

Fragment from vLLM logs:

experiences impro customer andtyaking5 Compet thatbrace multimodalcos canate fromitors establish leadership in indust.Ches Consider\nWh a- offers:els to must that clean2 Model in necessaryise\n Integr: models be, careful of flow communicationination4 Expability Organ must models transparent explain en users understand reasoningisionsical, asacy bias and.izations address concerns respons.\nIher in new </s><s>[INST] Fix this:\n\nEnhanced Customer Experience: Chatbots and other models can personal experiences impro customer andtyaking5 Compet thatbrace multimodalcos canate fromitors establish leadership in indust.Ches Consider\nWh a- offers:els to must that clean2 Model in necessaryise\nIntegr: models be, careful of flow communicationination4 Expability Organ must models transparent explain en users understand reasoningisionsical, asacy bias and.izations address concerns respons.\nIher in new 

Full line from logs:

INFO:     52.0.25.199:51858 - "POST /v1/completions HTTP/1.1" 200 OK
INFO 10-11 21:17:07 async_llm_engine.py:371] Received request cmpl-42f1b71fc34b4a868ca600c5432557cd: prompt: "<s>[INST] Write an article about Generative AI and how it's a multi modal ecosystem or universe, that organizations are deploying many, different models for various use cases they are trying to solve. There is not one model to rule them all. [/INST] Generative AI: A Multi-Modal Ecosystem for Organizations\n\nGenerative AI has revolutionized the way organizations approach problem-solving. Rather than relying on a single, universal model, organizations are deploying multiple models tailored to specific use cases. This multi-modal ecosystem allows organizations to tackle a wide range of challenges and opportunities, fostering innovation and efficiency.\n\nThe Era of Specialized Models\n\nIn the past, organizations sought a single, all-encompassing AI model that could address various tasks. However, this approach has limitations, as different problems require unique strengths and capabilities. For instance, a model exceling at image recognition might struggle with natural language processing.\n\nThe advent of generative AI has led to the development of specialized models, each designed to excel in a specific domain. This shift has empowered organizations to create a diverse ecosystem of models, where each model excels in its designated area.\n\nUse Cases and Models\n\nOrganizations are deploying multiple models to tackle various use cases, including:\n\n1. Chatbots: Organizations use chatbots to provide customer support, answer frequently asked questions, and help customers navigate their services. These models are trained on vast amounts of text data, enabling them to understand and respond to customer inquiries.\n2. Fraud Detection: Generative AI models are being used to detect fraudulent activities, such as credit card transactions or insurance claims. These models analyze patterns and anomalies in data to identify potential fraud, helping organizations minimize financial losses.\n3. Predictive Maintenance: Predictive maintenance models forecast when equipment or machinery is likely to fail, allowing organizations to schedule maintenance before a breakdown occurs. This proactive approach reduces downtime, decreases repair costs, and improves overall efficiency.\n4. Content Generation: Generative AI models are being used to create engaging content, such as blog posts, social media posts, and product descriptions. These models can generate high-quality content quickly and efficiently, freeing up human resources for more strategic tasks.\n5. Image and Video Analysis: Computer vision models are being used to analyze images and videos, extracting valuable insights and information. Applications include facial recognition, object detection, and quality control in manufacturing.\n6. Natural Language Processing: NLP models are being used to analyze and generate human language, enabling applications such as sentiment analysis, text summarization, and language translation.\n7. Drug Discovery: Generative AI models are being used in drug discovery to create new molecular structures with potential therapeutic properties. These models can analyze vast amounts of data and generate novel compounds, accelerating the drug development process.\n\nBenefits of a Multi-Modal Ecosystem\n\nThe deployment of multiple models offers several advantages, including:\n\n1. Increased Accuracy: Specialized models excel in their designated areas, leading to higher accuracy and better results.\n2. Impro Eiency autom various tasks, organizations can reallocate resources to more strategic initiatives, fosteringation growth.\n3. Enhanced Customer Experience: Chatbots and other models can personal experiences impro customer andtyaking5 Compet thatbrace multimodalcos canate fromitors establish leadership in indust.Ches Consider\nWh a- offers:els to must that clean2 Model in necessaryise\n Integr: models be, careful of flow communicationination4 Expability Organ must models transparent explain en users understand reasoningisionsical, asacy bias and.izations address concerns respons.\nIher in new </s><s>[INST] Fix this:\n\nEnhanced Customer Experience: Chatbots and other models can personal experiences impro customer andtyaking5 Compet thatbrace multimodalcos canate fromitors establish leadership in indust.Ches Consider\nWh a- offers:els to must that clean2 Model in necessaryise\nIntegr: models be, careful of flow communicationination4 Expability Organ must models transparent explain en users understand reasoningisionsical, asacy bias and.izations address concerns respons.\nIher in new [/INST] Enhanced Customer Experience: Chatbots and other models can personalize experiences, improving customer satisfaction and loyalty. Competitors that embrace multimodal interfaces can establish leadership in their industries. Consider the following:\n\n* Whatever offers: Models must be clean and well-maintained to ensure a seamless customer experience.\n* Integration: Models should be integrated into existing communication channels to avoid confusion and ensure a consistent user experience.\n* Explainability: Organizations must ensure that their models are transparent and explainable, so users understand the reasoning behind the AI's decisions, reducing concerns about bias and ethical issues.\n* Addressing concerns: Responsible AI practices require organizations to address concerns about data privacy, security, and potential negative impacts on employment.\n\nInvesting in a diverse ecosystem of generative AI models enables organizations to tackle various challenges, improve efficiency, and enhance customer experiences. By embracing this multi-modal approach, organizations can establish themselves as leaders in their industries and reap the benefits of AI innovation. </s><s>[INST] Why is open source h2oGPT better than closed source models? [/INST]", sampling params: SamplingParams(n=1, best_of=1, presence_penalty=0.6, frequency_penalty=0.0, temperature=0.0, top_p=1.0, top_k=-1, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=['[INST]', '</s>', '[/INST]'], ignore_eos=False, max_tokens=1024, logprobs=None, skip_special_tokens=True), prompt token ids: [1, 1, 29961, 25580, 29962, 14350, 385, 4274, 1048, 3251, 1230, 319, 29902, 322, 920, 372, 29915, 29879, 263, 2473, 13008, 321, 3944, 973, 470, 19859, 29892, 393, 25700, 526, 7246, 292, 1784, 29892, 1422, 4733, 363, 5164, 671, 4251, 896, 526, 1811, 304, 4505, 29889, 1670, 338, 451, 697, 1904, 304, 5751, 963, 599, 29889, 518, 29914, 25580, 29962, 3251, 1230, 319, 29902, 29901, 319, 14974, 29899, 19751, 382, 3944, 973, 363, 9205, 17063, 13, 13, 5631, 1230, 319, 29902, 756, 19479, 1891, 278, 982, 25700, 2948, 1108, 29899, 2929, 1747, 29889, 390, 1624, 1135, 337, 5890, 373, 263, 2323, 29892, 15968, 1904, 29892, 25700, 526, 7246, 292, 2999, 4733, 12464, 4395, 304, 2702, 671, 4251, 29889, 910, 2473, 29899, 15601, 321, 3944, 973, 6511, 25700, 304, 22002, 280, 263, 9377, 3464, 310, 18066, 267, 322, 28602, 1907, 29892, 9926, 3241, 24233, 362, 322, 19201, 29889, 13, 13, 1576, 21018, 310, 12630, 1891, 3382, 1379, 13, 13, 797, 278, 4940, 29892, 25700, 18365, 263, 2323, 29892, 599, 29899, 264, 2388, 465, 292, 319, 29902, 1904, 393, 1033, 3211, 5164, 9595, 29889, 2398, 29892, 445, 2948, 756, 27028, 29892, 408, 1422, 4828, 1996, 5412, 9324, 29879, 322, 27108, 29889, 1152, 2777, 29892, 263, 1904, 10616, 292, 472, 1967, 19679, 1795, 21117, 411, 5613, 4086, 9068, 29889, 13, 13, 1576, 17623, 310, 1176, 1230, 319, 29902, 756, 5331, 304, 278, 5849, 310, 4266, 1891, 4733, 29892, 1269, 8688, 304, 10616, 297, 263, 2702, 5354, 29889, 910, 9500, 756, 3710, 1680, 287, 25700, 304, 1653, 263, 16984, 321, 3944, 973, 310, 4733, 29892, 988, 1269, 1904, 5566, 1379, 297, 967, 25373, 4038, 29889, 13, 13, 11403, 315, 2129, 322, 3382, 1379, 13, 13, 27356, 17063, 526, 7246, 292, 2999, 4733, 304, 22002, 280, 5164, 671, 4251, 29892, 3704, 29901, 13, 13, 29896, 29889, 678, 271, 29890, 1862, 29901, 9205, 17063, 671, 13563, 29890, 1862, 304, 3867, 11962, 2304, 29892, 1234, 13672, 4433, 5155, 29892, 322, 1371, 20330, 23624, 1009, 5786, 29889, 4525, 4733, 526, 16370, 373, 13426, 26999, 310, 1426, 848, 29892, 427, 17961, 963, 304, 2274, 322, 10049, 304, 11962, 297, 6578, 2722, 29889, 13, 29906, 29889, 7347, 566, 360, 2650, 428, 29901, 3251, 1230, 319, 29902, 4733, 526, 1641, 1304, 304, 6459, 5227, 566, 352, 296, 14188, 29892, 1316, 408, 16200, 5881, 22160, 470, 1663, 18541, 16726, 29889, 4525, 4733, 27599, 15038, 322, 29342, 284, 583, 297, 848, 304, 12439, 7037, 5227, 566, 29892, 19912, 25700, 6260, 675, 18161, 28495, 29889, 13, 29941, 29889, 21099, 919, 573, 4241, 841, 749, 29901, 21099, 919, 573, 25413, 4733, 29821, 579, 746, 21083, 470, 7672, 262, 708, 338, 5517, 304, 4418, 29892, 14372, 25700, 304, 20410, 25413, 1434, 263, 2867, 3204, 10008, 29889, 910, 410, 4925, 2948, 26830, 16611, 593, 603, 29892, 9263, 2129, 26032, 21544, 29892, 322, 4857, 1960, 12463, 19201, 29889, 13, 29946, 29889, 10576, 28203, 29901, 3251, 1230, 319, 29902, 4733, 526, 1641, 1304, 304, 1653, 3033, 6751, 2793, 29892, 1316, 408, 12618, 11803, 29892, 5264, 5745, 11803, 29892, 322, 3234, 2342, 1980, 29889, 4525, 4733, 508, 5706, 1880, 29899, 29567, 2793, 9098, 322, 29497, 29892, 3889, 292, 701, 5199, 7788, 363, 901, 16650, 293, 9595, 29889, 13, 29945, 29889, 7084, 322, 13987, 24352, 29901, 20972, 18551, 4733, 526, 1641, 1304, 304, 27599, 4558, 322, 19707, 29892, 6597, 292, 21114, 1663, 5861, 322, 2472, 29889, 2401, 5795, 3160, 2258, 1455, 19679, 29892, 1203, 15326, 29892, 322, 11029, 2761, 297, 12012, 3864, 29889, 13, 29953, 29889, 18385, 17088, 10554, 292, 29901, 405, 13208, 4733, 526, 1641, 1304, 304, 27599, 322, 5706, 5199, 4086, 29892, 427, 17961, 8324, 1316, 408, 19688, 7418, 29892, 1426, 19138, 2133, 29892, 322, 4086, 13962, 29889, 13, 29955, 29889, 360, 11124, 8565, 22205, 29901, 3251, 1230, 319, 29902, 4733, 526, 1641, 1304, 297, 15721, 20699, 304, 1653, 716, 13206, 16637, 12286, 411, 7037, 266, 1572, 412, 329, 293, 4426, 29889, 4525, 4733, 508, 27599, 13426, 26999, 310, 848, 322, 5706, 9554, 752, 3885, 29892, 15592, 1218, 278, 15721, 5849, 1889, 29889, 13, 13, 20841, 1389, 1169, 310, 263, 14974, 29899, 19751, 382, 3944, 973, 13, 13, 1576, 18209, 310, 2999, 4733, 16688, 3196, 25486, 29892, 3704, 29901, 13, 13, 29896, 29889, 512, 1037, 1463, 4831, 332, 4135, 29901, 12630, 1891, 4733, 10616, 297, 1009, 25373, 10161, 29892, 8236, 304, 6133, 13600, 322, 2253, 2582, 29889, 13, 29906, 29889, 1954, 771, 382, 13396, 3345, 5164, 9595, 29892, 25700, 508, 1855, 2029, 403, 7788, 304, 901, 16650, 293, 14511, 5056, 29892, 9926, 3241, 362, 14321, 29889, 13, 29941, 29889, 1174, 29308, 21886, 28224, 5597, 29901, 678, 271, 29890, 1862, 322, 916, 4733, 508, 7333, 27482, 4857, 11962, 322, 1017, 5086, 29945, 24620, 393, 13842, 1773, 326, 397, 284, 3944, 508, 403, 515, 17259, 10127, 26001, 297, 6397, 29889, 1451, 267, 10056, 13, 8809, 263, 29899, 16688, 29901, 1379, 304, 1818, 393, 5941, 29906, 8125, 297, 5181, 895, 13, 17100, 29901, 4733, 367, 29892, 16010, 310, 4972, 12084, 3381, 29946, 12027, 3097, 9205, 1818, 4733, 17772, 5649, 427, 4160, 2274, 24481, 12112, 936, 29892, 408, 4135, 24003, 322, 29889, 17063, 3211, 21838, 5544, 29889, 13, 29902, 2276, 297, 716, 2, 29966, 29879, 24566, 25580, 29962, 24778, 445, 29901, 13, 13, 2369, 29308, 21886, 28224, 5597, 29901, 678, 271, 29890, 1862, 322, 916, 4733, 508, 7333, 27482, 4857, 11962, 322, 1017, 5086, 29945, 24620, 393, 13842, 1773, 326, 397, 284, 3944, 508, 403, 515, 17259, 10127, 26001, 297, 6397, 29889, 1451, 267, 10056, 13, 8809, 263, 29899, 16688, 29901, 1379, 304, 1818, 393, 5941, 29906, 8125, 297, 5181, 895, 13, 23573, 29901, 4733, 367, 29892, 16010, 310, 4972, 12084, 3381, 29946, 12027, 3097, 9205, 1818, 4733, 17772, 5649, 427, 4160, 2274, 24481, 12112, 936, 29892, 408, 4135, 24003, 322, 29889, 17063, 3211, 21838, 5544, 29889, 13, 29902, 2276, 297, 716, 518, 29914, 25580, 29962, 1174, 29308, 21886, 28224, 5597, 29901, 678, 271, 29890, 1862, 322, 916, 4733, 508, 7333, 675, 27482, 29892, 4857, 1747, 11962, 26470, 322, 28108, 1017, 29889, 24620, 17259, 393, 953, 13842, 1773, 326, 397, 284, 19510, 508, 10127, 26001, 297, 1009, 6397, 2722, 29889, 10056, 278, 1494, 29901, 13, 13, 29930, 806, 5564, 16688, 29901, 3382, 1379, 1818, 367, 5941, 322, 1532, 29899, 29885, 2365, 7114, 304, 9801, 263, 409, 314, 2222, 11962, 7271, 29889, 13, 29930, 17100, 362, 29901, 3382, 1379, 881, 367, 23387, 964, 5923, 12084, 18196, 304, 4772, 14679, 322, 9801, 263, 13747, 1404, 7271, 29889, 13, 29930, 12027, 7420, 3097, 29901, 9205, 17063, 1818, 9801, 393, 1009, 4733, 526, 17772, 322, 5649, 519, 29892, 577, 4160, 2274, 278, 24481, 5742, 278, 319, 29902, 29915, 29879, 1602, 12112, 29892, 27668, 21838, 1048, 24003, 322, 11314, 936, 5626, 29889, 13, 29930, 16428, 292, 21838, 29901, 2538, 29886, 787, 1821, 319, 29902, 23274, 1996, 25700, 304, 3211, 21838, 1048, 848, 5999, 4135, 29892, 6993, 29892, 322, 7037, 8178, 10879, 29879, 373, 5703, 358, 29889, 13, 13, 797, 10147, 292, 297, 263, 16984, 321, 3944, 973, 310, 1176, 1230, 319, 29902, 4733, 28936, 25700, 304, 22002, 280, 5164, 18066, 267, 29892, 11157, 19201, 29892, 322, 26371, 749, 11962, 27482, 29889, 2648, 7232, 945, 292, 445, 2473, 29899, 15601, 2948, 29892, 25700, 508, 10127, 6053, 408, 20251, 297, 1009, 6397, 2722, 322, 337, 481, 278, 23633, 310, 319, 29902, 24233, 362, 29889, 2, 29966, 29879, 24566, 25580, 29962, 3750, 338, 1722, 2752, 298, 29906, 29877, 29954, 7982, 2253, 1135, 5764, 2752, 4733, 29973, 518, 29914, 25580, 29962].
INFO 10-11 21:17:10 llm_engine.py:616] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 22.0 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.3%, CPU KV cache usage: 0.0%
INFO:     52.0.25.199:51864 - "POST /v1/completions HTTP/1.1" 200 OK

FYI how we run, it's using latest vLLM 0.2.0:

docker pull gcr.io/vorvan/h2oai/h2ogpt-runtime:0.1.0
mkdir -p $HOME/.cache/huggingface/hub

docker run -d \
    --runtime=nvidia \
    --gpus '"device=0,1,2,3"' \
    --shm-size=10.24gb \
    -p 5000:5000 \
    --entrypoint /h2ogpt_conda/vllm_env/bin/python3.10 \
    -e NCCL_IGNORE_DISABLED_P2P=1 \
    -v /etc/passwd:/etc/passwd:ro \
    -v /etc/group:/etc/group:ro \
    -u `id -u`:`id -g` \
    -v "${HOME}"/.cache:/workspace/.cache \
    --network host \
    gcr.io/vorvan/h2oai/h2ogpt-runtime:0.1.0 -m vllm.entrypoints.openai.api_server \
        --port=5000 \
        --host=0.0.0.0 \
        --model=h2oai/h2ogpt-4096-llama2-70b-chat \
        --tokenizer=hf-internal-testing/llama-tokenizer \
        --tensor-parallel-size=4 \
        --seed 1234 \
        --trust-remote-code \
        --max-num-batched-tokens 8192 \
        --download-dir=/workspace/.cache/huggingface/hub &>> logs.vllm_server.70.txt

Note it's normal for vLLM to be hit with concurrent requests in our case, so probably bug in the continuous batching I'd guess.

Also, is there no way to limit concurrency?

pseudotensor commented 1 year ago

I think I misunderstand the logs, will close for now.