Closed j6mes closed 6 years ago
Metric | NLTK | DRQA Sents Precomputed IDF | DRQA Sents New IDF |
---|---|---|---|
Runtime | 2 hours | 10 hours | 12 hours |
Strict Accuracy (strict) requirement for correct evidence | 0.2476 | 0.1827 | 0.2698 |
Classification Accuracy Without Need For Evidence | 0.4885 | 0.4588 | 0.4922 |
Correct Document Return Rate (dmatch) | 0.5793 | 0.5893 | 0.5893 |
Correct Document Return Rate after sentence selection (smatch) | 0.4773 | 0.2690 | 0.5596 |
Correct Text Return Rate (for Refutes/Supports) | 0.3647 | 0.1083 | 0.4680 |
@andreasvlachos using DrQA instead of NLTK for sentence selection gives us a 2% boost - at the cost of an extra 10 hours. dmatch and smatch figures give us upper bounds for strict accuracy (considering the supported/refuted class). In the case of DrQA - the number of times the correct document is in the evidence after sentence selection is 55% of the time whereas using NLTK, this is only 47%.