If the answer contains multiple tokens (for example containing multiple words, or even a whole sentence), how does fitbert compute the cummulative probability of this kind of answer. Because MLM outputs the probabilities for every single tokens, not for groups of tokens.
If the answer contains multiple tokens (for example containing multiple words, or even a whole sentence), how does fitbert compute the cummulative probability of this kind of answer. Because MLM outputs the probabilities for every single tokens, not for groups of tokens.