Closed pg2455 closed 8 months ago
Apologies for my delay in responding!
p_extra
is the probability mass associated with tokens that do not have a role in our numerical encoding scheme (tokens that are not digits, separators, or signs). We can adjust the original log probabilities by p_extra
in order to obtain a discrete distribution over fixed precision numbers (which correspond to bins as described in the paper) and then a corresponding a continuous density.
Unfortunately it is not possible to do this filtering of non-numerical tokens exactly, because the OpenAI API only returns log probabilities for the top 5 tokens beyond the sampled token. Thus the normalization values are larger in some cases than they could be (and corresponding probabilities smaller). In our experiments with LLaMA-2 models, there is no such limit, and we can perform the filtering perfect.
We will add a note explaining this detail to the Appendix, as originally intended. Thank you for pointing this oversight out!
Nate
I couldn't find the reason in Appendix to account for
p_extra
in NLL/D calculation. Could you please comment on this step? If I missed something, can you please point me to the right place?https://github.com/ngruver/llmtime/blob/adefc38d142cba6049db424f3be4c30e3db35380/models/gpt.py#L121
I am also curious whether this function will ensure a non-negative constraint on the return values. Thanks in advance!