runpod-workers / worker-vllm

The RunPod worker template for serving our large language model endpoints. Powered by vLLM.
MIT License
238 stars 96 forks source link

Default parameters for MAX_CONCURRENCY and DEFAULT_BATCH_SIZE are reversed in utils.py #23

Closed nicolasembleton closed 10 months ago

nicolasembleton commented 10 months ago

See here: https://github.com/runpod-workers/worker-vllm/blob/f324bef8a09ff24629fb107ae712989dab58fd25/src/utils.py#L11-L12

Happy to push a PR if it helps.

nicolasembleton commented 10 months ago

Also I think the default variable constants.py/MAX_CONCURRENCY should be named DEFAULT_MAX_CONCURRENCY and the global variable DEFAULT_BATCH_SIZE should be named BATCH_SIZE for consistency and readability.

alpayariyak commented 10 months ago

Thank you for the catch, just pushed a fix.

In regards to renaming DEFAULT_BATCH_SIZE to BATCH_SIZE, the default carries a different meaning - unlike MAX_CONCURRENCY, we can specify the batch size used in each request, so the default is for when a request doesn't specify a batch size, so I'm not sure if that naming convention will be best