Closed YJiangcm closed 2 years ago
Hi,
Here we use the same way of calculating F1 as in question answering tasks. It's simply unigram, not bigram. For each sentence, you have a bag of words, and true positive would be the words that are at the intersection of both sets.
Got it, thx!
Thanks for your excellent job. In section 4 of the paper, you compute the lexical overlap (F1 measured between two bags of words) for the entailment pairs (SNLI + MNLI) is 39%, while they are 60% and 55% for QQP and ParaNMT. I wonder if two bag of words means using window slice of 2? And how to define true posotive, true negative, false positive and false negative to compute the F1 score?