Closed NoviScl closed 3 years ago
Good question! A complicated story, but here's a summary: there are two version of the annotated answers on NQ that have slight difference in postprocessing of HTML data. Without id2answers
, it uses one of the versions used in DPR. With id2answers
, it uses the other version from Google. I wrote the code to use id2answers
to use the Google version. More details can be found in the last paragraph of README - result.
I found that the numbers over two versions are marginally different - usually less than 1%.
For evaluation on NQ, what exactly is id2answers? I noticed that you set
self.data[i]["answer"] += id2answers[d["id"]]
for training butself.data[i]["answer"] = id2answers[d["id"]]
for evaluation, may I know what's the distinction?Thanks.