microsoft / MSMARCO-Passage-Ranking

MS MARCO(Microsoft Machine Reading Comprehension) is a large scale dataset focused on machine reading comprehension, question answering, and passage ranking. A variant of this task will be the part of TREC and AFIRM 2019. For Updates about TREC 2019 please follow This Repository Passage Reranking task Task Given a query q and a the 1000 most relevant passages P = p1, p2, p3,... p1000, as retrieved by BM25 a succeful system is expected to rerank the most relevant passage as high as possible. For this task not all 1000 relevant items have a human labeled relevant passage. Evaluation will be done using MRR
https://microsoft.github.io/MSMARCO-Passage-Ranking/
MIT License
300 stars 39 forks source link

On full-documents and passage alignement #12

Closed Ricocotam closed 4 years ago

Ricocotam commented 4 years ago

Hi, Is there an alignement available (or a script) for passages and full documents ? Since passages are extracted from documents we should be able to have this information. Since the maintainer is present on a lot of MS-MARCO project, I was wondering if a release date for Document Rankin's data was available ?

spacemanidol commented 4 years ago

@Ricocotam actually the passages were not extracted from these documents. The document corpus was crawled after the QnA dataset(which is the basis of the ranking task) was released. Thus there isn't a direct mapping. @bmitra-msft worked on a passage2url join I remember. Anything we can share broadly?

I am currently working on the evaluation method for the document ranking which I should be able to it out in March-ish. In the meantime, TREC has released the labels associated with the document ranking task ran in the 2019 Deep Learning Track.

Ricocotam commented 4 years ago

Oh ok, that's not what I understood from de paper. Thanks for the help

bmitra-msft commented 4 years ago

Unfortunately, we don't have any immediate plans to release any passage-to-document mapping.