turboderp / exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs
MIT License
3.45k stars 257 forks source link

Tweak to multiple cache example #157

Closed dvianisoho closed 9 months ago

dvianisoho commented 9 months ago

Believe line 116 should be for i in sorted(eos, reverse=True): to cover the case where a multiple prompts finish at the same time and one closer to the front of the queue gets popped first.

turboderp commented 9 months ago

The eos list is already in reverse order for that reason:

    eos = []
    ...
    for i in range(len(input_ids)):
        ...
            eos.insert(0, i)
dvianisoho commented 9 months ago

You are right. For some reason when I implemented this I was appending to the list, not inserting at the beginning.