One useful feature is to be able to extract confidence level and top-n probs on a per token basis as the generation happens. It can open up a lot of applications. One interesting use-case is to do monte-carlo search over the answers to boost the output of models.
One useful feature is to be able to extract confidence level and top-n probs on a per token basis as the generation happens. It can open up a lot of applications. One interesting use-case is to do monte-carlo search over the answers to boost the output of models.