thunlp / Ouroboros

Ouroboros: Speculative Decoding with Large Model Enhanced Drafting (EMNLP 2024 main)
Apache License 2.0
77 stars 9 forks source link

About candidates pool #4

Closed reflectionie closed 7 months ago

reflectionie commented 7 months ago

Thank you for your work. I have a few questions regarding the experimental section that I would like to inquire about:

  1. In Figure 5, you mentioned that the candidate pool size is 20. In the results shown in Figure 4, is the candidate pool size also 20? If so, will the hit rate be too low because the candidate pool is too sparse? (A pool with a size of 20 is an extremely small number compared to the vocabulary size. Even taking into account locality, this candidate pool size seems to be too small?) image image

  2. Could I ask if the candidates in your candidate pool can also be regarded as a kind of n-gram, just like lookahead?

reflectionie commented 7 months ago

Sorry, I may have some misunderstandings about question 1. I just read the code. The candidate pool size seems to mean that one key corresponds to a maximum of 20 values. Is this understanding correct? If it is correct, what is the approximate size of the key (i.e., the number of entries in the token map)?

huangyuxiang03 commented 7 months ago

Thanks for asking.

  1. No. The candidate pool size is as same as the window size $W$ recorded in Table 7. Thanks for pointing out the vagueness of our explanations. For each key in the vocalulary, we maintain a siding window whose size is $W$, as well as lookahead decoding. Thus the total capacity of the candidate pool is the size of vocabulary * $W$. The approximate size of key is hard to measure, since they may vary among different models or different tasks. For example, if the model has a relatively large vocabulary and it is evaluated on code generation tasks, only the sliding window of symbol tokens (i.e., def, class) will be fullfilled. However, when we evaluate the model on natural language tasks (i.e. summarization), nearly all sliding windows are fullfilled. This is exactly what we have observed, which is one of the implicit reason why the experiment of Figure 6 and Table 4 works (since the "activated" parts of candidate pool might be different in each task).
  2. Yes. They can be regarded as N-grams of lookahead decoding. They are referenced by a key token with several trailing tokens representing language pieces during generation.
reflectionie commented 7 months ago

I think I totally understand, thank you for your detailed explanation!