ielab / GPT_Ranker

Use GPT-2/T5 to beat traditional LM
2 stars 0 forks source link

MSMARCO Document Train Dataset #7

Closed hanglics closed 4 years ago

hanglics commented 4 years ago

We already got:

  1. Full contents of the Documents
  2. Queries for training Documents
  3. Qrels for training Passages

Need to figure out how to map passage ID to document ID, and construct the file like doc_query_pairs.train.tsv, where each line contains the document and the query, separated by \t.

ArvinZhuang commented 4 years ago

Documents that contain relevant passages are considered to be relevant too.