turboderp / exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs
MIT License
3.53k stars 273 forks source link

Optionally return logits from streaming generator #308

Closed silphendio closed 8 months ago

silphendio commented 8 months ago

Optionally return logits in streaming generator.

I'm trying to make an OpenAI-compatible LLM Server. This is about top_logprobs.

ExLlamaV2StreamingGenerator now optionally returns the probability of the chosen token, but I need the probabilities of the top N candidates. This seemed like the simplest way to get them.