I'm trying to make an OpenAI-compatible LLM Server. This is about top_logprobs.
ExLlamaV2StreamingGenerator now optionally returns the probability of the chosen token, but I need the probabilities of the top N candidates. This seemed like the simplest way to get them.
Optionally return logits in streaming generator.
I'm trying to make an OpenAI-compatible LLM Server. This is about top_logprobs.
ExLlamaV2StreamingGenerator now optionally returns the probability of the chosen token, but I need the probabilities of the top N candidates. This seemed like the simplest way to get them.