"n" parameter does not return multiple responses

I'm encountering an issue with the "n" parameter. It appears not to be functioning as expected since it does not return multiple responses. According to the documentation, the "n" parameter correlates to the number of output sequences to return for the given prompt.

{
    "input": {
        "prompt": "A prompt...",
        "sampling_params": {
            "max_tokens": 300,
            "temperature": 0.85,
            "top_k": 40,
            "n": 3,
            "presence_penalty": 1.2
        }
    }
}

Expected:

The API should return three (3) distinct answers.

Actual:

Despite setting "n" to 3, only one response is returned.

{
    "delayTime": 161,
    "executionTime": 4182,
    "id": "sync-302d9141-abf8-4335-9499-0caf70777c81-u1",
    "output": [
        [
            {
                "text": " A response",
                "usage": {
                    "input": 274,
                    "output": 26
                }
            }
        ]
    ],
    "status": "COMPLETED"
}

I've also encountered another confusing outcome when I attempted to use the "stream" parameter.

{
    "input": {
        "prompt": "### Instruction:\nSimply output the answer to 1+1\n\n### Response:\nSure, 1+1=",
        "stream": true,
        "sampling_params": {
            "max_tokens": 50,
            "temperature": 0.85,
            "top_k": 40,
            "n": 3,
            "presence_penalty": 1.2
        }
    }
}

Expected:

As the 'stream' option is set to 'true', I expected it to handle the 3 responses set by 'n'.

Actual:

Results are unexpected; it appears multiple responses are received but only the first contains text while the others are empty or single characters.

{
    "delayTime": 177,
    "executionTime": 1512,
    "id": "sync-0bc6241b-4350-4664-9d09-cd5d2ef6e59a-u1",
    "output": [
        [
            {
                "text": "2",
                "usage": {
                    "input": 32,
                    "output": 1
                }
            },
            {
                "text": "",
                "usage": {
                    "input": 32,
                    "output": 1
                }
            },
            {
                "text": "",
                "usage": {
                    "input": 32,
                    "output": 1
                }
            },
            {
                "text": "",
                "usage": {
                    "input": 32,
                    "output": 2
                }
            },
            {
                "text": "",
                "usage": {
                    "input": 32,
                    "output": 2
                }
            },
            {
                "text": ".",
                "usage": {
                    "input": 32,
                    "output": 2
                }
            },
            {
                "text": "",
                "usage": {
                    "input": 32,
                    "output": 2
                }
            },
            {
                "text": "",
                "usage": {
                    "input": 32,
                    "output": 2
                }
            },
            {
                "text": ".",
                "usage": {
                    "input": 32,
                    "output": 3
                }
            }
        ]
    ],
    "status": "COMPLETED"
}

Unclear whether this is correct or a bug. Any suggestions or potential solutions would be appreciated.

Thanks! 🤗

runpod-workers / worker-vllm