Closed HuaYZhao closed 4 years ago
Hi @HuaYZhao, thanks for your interest.
First, as AmbigNQ is an open-domain QA task, there is no provided context. The available supervision is only the answer text - there is no groundtruth start and end positions.
Therefore, you are correct that the ids in AmbigNQ and Wikipedia DB do not match each other.
When experimenting with baselines in the paper, we use Dense Passage Retrieval to retrieve related passages to feed in to the reader model. Note that, however, that this retrieval step is part of the model, rather than part of the data.
How do I get the context in AmbigNQ? I downloaded the extra resources, but the id in AmbigNQ doesn't correspond to the id in sqlite3. I also downloaded the data for the NQ, and although example ids correspond to each other, the data for AmbigNQ appears to have been processed as plain, but the data for the original NQ is not. A number of questions arise about the match, and do the start and end positions of the answer not need to be?