predibase / lorax

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
https://loraexchange.ai
Apache License 2.0
1.86k stars 125 forks source link

Add echo parameter in request #518

Open dennisrall opened 1 week ago

dennisrall commented 1 week ago

Feature request

To evaluate the models, often an echo parameter can be set to true in the request. Then the prompt will be echoed (passes token by token) to the model and the corresponding tokens will be added to the output.

Motivation

This is no longer supported by the OpenAI api but might be a good point to differentiate from them. Is this possible or how much work is involved to get this running?

Your contribution

I could try a PR with some guidance if it is not that hard. I am familiar with python but new to rust