Closed yyzhuang1991 closed 2 years ago
Hello and thanks for reaching out! I believe this part returns the candidates and this part counts correct answers in top-N. In the paper we only present the top-1 results as these are the most relevant in practice and therefore make for the best benchmark. Let us know if you need more info or if this doesn't answer your question! Best wishes, Stefan.
I see. That helps! Do you have any idea about how the precision recall and F1 scores look like in this task?
Precision and recall do not differ in this prediction/nomination task. So here accuracy=recall=precision since for each and every page there is exactly one correct answer (labeled element per class) and the algorithm can either get it right or wrong. At the end, we measure how many answers it got right. In the classification task we sample a bunch of elements and ask the classifiers to label them - and here F1 score is more representative. Does this make sense?
That makes sense. Thanks.
I wonder what function in the evaluation script is used to compute the predictive accuracy. Is it the predict_on_test_set() or eval_on_test_set() in this script ? If it is the first one, what did you use for the top-n value to report the predictive accuracy in Table 3 in the appendix?
Thanks in advance.