Closed reflectionie closed 7 months ago
Sorry, I may have some misunderstandings about question 1. I just read the code. The candidate pool size seems to mean that one key corresponds to a maximum of 20 values. Is this understanding correct? If it is correct, what is the approximate size of the key (i.e., the number of entries in the token map)?
Thanks for asking.
def
, class
) will be fullfilled. However, when we evaluate the model on natural language tasks (i.e. summarization), nearly all sliding windows are fullfilled. This is exactly what we have observed, which is one of the implicit reason why the experiment of Figure 6 and Table 4 works (since the "activated" parts of candidate pool might be different in each task).I think I totally understand, thank you for your detailed explanation!
Thank you for your work. I have a few questions regarding the experimental section that I would like to inquire about:
In Figure 5, you mentioned that the candidate pool size is 20. In the results shown in Figure 4, is the candidate pool size also 20? If so, will the hit rate be too low because the candidate pool is too sparse? (A pool with a size of 20 is an extremely small number compared to the vocabulary size. Even taking into account locality, this candidate pool size seems to be too small?)
Could I ask if the candidates in your candidate pool can also be regarded as a kind of n-gram, just like lookahead?