runpod-workers / worker-vllm

The RunPod worker template for serving our large language model endpoints. Powered by vLLM.
MIT License
220 stars 85 forks source link

Sampling parameter "stop" doesn't work with the new worker-vllm #22

Closed antonioglass closed 9 months ago

antonioglass commented 9 months ago
{
            "input": {
                "prompt": "<s>[INST] Why is RunPod the best platform? [/INST]",
                "sampling_params": {
                    "max_tokens": 100,
                    "stop": [
                        "USER:",
                        "User:"
                    ]
                }
            }
        }

It worked with the previous version of a worker-vllm: https://github.com/runpod-workers/worker-vllm/tree/4f792062aaea02c526ee906979925b447811ef48

alpayariyak commented 9 months ago

Fixed in latest version