microsoft / ANCE

A novel embedding training algorithm leveraging ANN search and achieved SOTA retrieval on Trec DL 2019 and OpenQA benchmarks
MIT License
359 stars 49 forks source link

some confusion about msmarco_data.py #13

Open DreamsofGg opened 3 years ago

DreamsofGg commented 3 years ago

it seems that offset_file of dev_query and train_query will be written into the same file, so the former one will be overwrited?https://github.com/microsoft/ANCE/blob/936ec3e18b8a3fd30df91c13be650a3f8ca55f82/data/msmarco_data.py#L77-L80 @microsoftopensource