turboderp / exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs
MIT License
3.23k stars 238 forks source link

Add dynatemp (the entropy one) #263

Closed awtrisk closed 5 months ago

awtrisk commented 6 months ago

Still some stuff to be checked, heavy wip.

awtrisk commented 6 months ago

alright, i'm not sure - are the actual logits temp_probs?

turboderp commented 6 months ago

The sample_basic function takes the input logits and converts them to a probability distribution (calling softmax_cpu). The result is temp_probs and temp_indices which are processed by the subsequent samplers. I have no idea if it's more correct to apply this dynamic temperature before or after the other samplers.

bdashore3 commented 5 months ago

Looking at the newer implementation in backends such as kcpp. It's definitely better to pass a single dynatemp_range parameter into the temperature sampler instead of overburdening the function with min, max, and enabled values. I believe the WIP code here should be refactored to reflect that. The frontend/API can handle calculation of the range to pass to exl2.

awtrisk commented 5 months ago

Implementation should be fine now, I don't really know what else to change though.

bdashore3 commented 5 months ago

Now that dynatemp has matured more, I'd like to stick to an implementation which will be inline with other backends such as Aphrodite.

  1. Merge dynatemp into the temp function (to allow temp_last to work)
  2. Remove dynatemp_enabled which is a redundant boolean
  3. Add a check within the temp function to enable dynatemp based on the min and max values along with if max > min.
  4. Add the option to use dynatemp_exponent to adjust that as well.

CC: @turboderp

awtrisk commented 5 months ago

Now that dynatemp has matured more, I'd like to stick to an implementation which will be inline with other backends such as Aphrodite.

1. Merge dynatemp into the temp function (to allow temp_last to work)

2. Remove dynatemp_enabled which is a redundant boolean

3. Add a check within the temp function to enable dynatemp based on the min and max values along with if max > min.

4. Add the option to use dynatemp_exponent to adjust that as well.

CC: @turboderp

All done - dynatemp is in the temp function, and the exponent can be adjusted.

bdashore3 commented 5 months ago

If you believe the PR is okay to review. Please remove draft status.