texttron / tevatron

Tevatron - A flexible toolkit for neural retrieval research and development.
http://tevatron.ai
Apache License 2.0
435 stars 87 forks source link

How to get the title in msmarco-passage #106

Closed BeastyZ closed 5 months ago

BeastyZ commented 5 months ago

Hi, @MXueguang According to the collection in TREC 2019 Deep Learning Track Guidelines, there is no 'title' in corpus. But I see the 'title' in your Tevatron/msmarco-passage. May I know how you get the title?

MXueguang commented 5 months ago

Hi @BeastyZ , we follow the rocketqa released code/data to create the Tevatron/msmarco-passage, which contains title augmentation.

btw, this paper https://arxiv.org/pdf/2304.12904.pdf have an analysis of with/ with out title

BeastyZ commented 5 months ago

Thank you for your timely help many times.