runpod-workers / worker-vllm

The RunPod worker template for serving our large language model endpoints. Powered by vLLM.
MIT License
242 stars 97 forks source link

MODEL_REVISION & TOKENIZER_REVISION: Both are needed to configure the revision #100

Open TimPietrusky opened 2 months ago

TimPietrusky commented 2 months ago

When someone wants to use a different revision of a model, they need to specify the revision. Looking at the README, it is not clear how to do that. My first assumption would be to use MODEL_REVISION, but that is not enough (and also not documented in the env-variables section). When you use the TOKENIZER_REVISION (which is documented), then it is also not working. Only when using both the MODEL_REVISION and the TOKENIZER_REVISION, then it is working.

I think it would make sense to document this in the README, so that users are sure what is happening.

@pandyamarut what do you think about this?