Open zhangsanaixuexi opened 4 years ago
The model predicts the probabilities of each token in the passage being the begin and end tokens of the answer span. When we combine a begin token and an end token, we get an answer span. n_best_size is how many top combinations of begin and end tokens we consider for the final output.
thank you a lot. If, for an example, the length of the text is greater than the maximum length, then a sliding window will be used. So, how to deal with it if the answer is not in this sliding window.
For training, if a document chunk does not contain an annotation, we throw it out, since there is nothing to predict.
thank you!