Adaptive decoding balances the diversity and coherence of open-ended text generation, making it an excellent method compared to top-k, top-p, and other decoding algorithms. For more details, the paper is available here, and the code can be accessed here.
Motivation
During the generation process, the distribution predicted by the language model generally falls into two categories. The first is a flattened distribution, indicating that the LM has multiple potential choices for the next token. The second is a sharp distribution, suggesting that the model's options are more limited. Ensuring that the model dynamically understands the current state is crucial for generating highly diverse and coherent sentences.
Feature request
Adaptive decoding balances the diversity and coherence of open-ended text generation, making it an excellent method compared to top-k, top-p, and other decoding algorithms. For more details, the paper is available here, and the code can be accessed here.
Motivation
During the generation process, the distribution predicted by the language model generally falls into two categories. The first is a flattened distribution, indicating that the LM has multiple potential choices for the next token. The second is a sharp distribution, suggesting that the model's options are more limited. Ensuring that the model dynamically understands the current state is crucial for generating highly diverse and coherent sentences.
Your contribution
idea done.