ngruver / llmtime

https://arxiv.org/abs/2310.07820
MIT License
673 stars 157 forks source link

Description not found for p_extra #11

Closed pg2455 closed 8 months ago

pg2455 commented 11 months ago

I couldn't find the reason in Appendix to account for p_extra in NLL/D calculation. Could you please comment on this step? If I missed something, can you please point me to the right place?

https://github.com/ngruver/llmtime/blob/adefc38d142cba6049db424f3be4c30e3db35380/models/gpt.py#L121

I am also curious whether this function will ensure a non-negative constraint on the return values. Thanks in advance!

ngruver commented 10 months ago

Apologies for my delay in responding!

p_extra is the probability mass associated with tokens that do not have a role in our numerical encoding scheme (tokens that are not digits, separators, or signs). We can adjust the original log probabilities by p_extra in order to obtain a discrete distribution over fixed precision numbers (which correspond to bins as described in the paper) and then a corresponding a continuous density.

Unfortunately it is not possible to do this filtering of non-numerical tokens exactly, because the OpenAI API only returns log probabilities for the top 5 tokens beyond the sampled token. Thus the normalization values are larger in some cases than they could be (and corresponding probabilities smaller). In our experiments with LLaMA-2 models, there is no such limit, and we can perform the filtering perfect.

We will add a note explaining this detail to the Appendix, as originally intended. Thank you for pointing this oversight out!

Nate