turboderp / exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs
MIT License
3.28k stars 243 forks source link

Adding return_lowest_perplexity #206

Open ziadloo opened 7 months ago

ziadloo commented 7 months ago

The "generate_simple" method will now accept a new boolean input argument called "return_lowest_perplexity". If this argument is set to True, the input prompt should be a single string while the batch_size should be greater than 1. Then the generator will generate as many as batch_size outputs based on the same input and it will return the one with lowest perplexity.