runpod-workers / worker-vllm

The RunPod worker template for serving our large language model endpoints. Powered by vLLM.
MIT License
238 stars 96 forks source link

"n" parameter does not return multiple responses #33

Closed hexadecible closed 9 months ago

hexadecible commented 10 months ago

I'm encountering an issue with the "n" parameter. It appears not to be functioning as expected since it does not return multiple responses. According to the documentation, the "n" parameter correlates to the number of output sequences to return for the given prompt.

{
    "input": {
        "prompt": "A prompt...",
        "sampling_params": {
            "max_tokens": 300,
            "temperature": 0.85,
            "top_k": 40,
            "n": 3,
            "presence_penalty": 1.2
        }
    }
}

Expected:

The API should return three (3) distinct answers.

Actual:

Despite setting "n" to 3, only one response is returned.

{
    "delayTime": 161,
    "executionTime": 4182,
    "id": "sync-302d9141-abf8-4335-9499-0caf70777c81-u1",
    "output": [
        [
            {
                "text": " A response",
                "usage": {
                    "input": 274,
                    "output": 26
                }
            }
        ]
    ],
    "status": "COMPLETED"
}

I've also encountered another confusing outcome when I attempted to use the "stream" parameter.

{
    "input": {
        "prompt": "### Instruction:\nSimply output the answer to 1+1\n\n### Response:\nSure, 1+1=",
        "stream": true,
        "sampling_params": {
            "max_tokens": 50,
            "temperature": 0.85,
            "top_k": 40,
            "n": 3,
            "presence_penalty": 1.2
        }
    }
}

Expected:

As the 'stream' option is set to 'true', I expected it to handle the 3 responses set by 'n'.

Actual:

Results are unexpected; it appears multiple responses are received but only the first contains text while the others are empty or single characters.

{
    "delayTime": 177,
    "executionTime": 1512,
    "id": "sync-0bc6241b-4350-4664-9d09-cd5d2ef6e59a-u1",
    "output": [
        [
            {
                "text": "2",
                "usage": {
                    "input": 32,
                    "output": 1
                }
            },
            {
                "text": "",
                "usage": {
                    "input": 32,
                    "output": 1
                }
            },
            {
                "text": "",
                "usage": {
                    "input": 32,
                    "output": 1
                }
            },
            {
                "text": "",
                "usage": {
                    "input": 32,
                    "output": 2
                }
            },
            {
                "text": "",
                "usage": {
                    "input": 32,
                    "output": 2
                }
            },
            {
                "text": ".",
                "usage": {
                    "input": 32,
                    "output": 2
                }
            },
            {
                "text": "",
                "usage": {
                    "input": 32,
                    "output": 2
                }
            },
            {
                "text": "",
                "usage": {
                    "input": 32,
                    "output": 2
                }
            },
            {
                "text": ".",
                "usage": {
                    "input": 32,
                    "output": 3
                }
            }
        ]
    ],
    "status": "COMPLETED"
}

Unclear whether this is correct or a bug. Any suggestions or potential solutions would be appreciated.

Thanks! 🤗

alpayariyak commented 9 months ago

Added in the latest update, thank you for your feedback!