stanford-futuredata / ColBERT

ColBERT: state-of-the-art neural search (SIGIR'20, TACL'21, NeurIPS'21, NAACL'22, CIKM'22, ACL'23, EMNLP'23)
MIT License
3.06k stars 388 forks source link

Location of collections.tsv, queries.train.small.tsv files #91

Closed sundavid2002 closed 2 years ago

sundavid2002 commented 2 years ago

Where can I find the collections.tsv and the queries.train.small.tsv files? They aren't on the MS MARCO Passage Ranking page.

okhat commented 2 years ago

You might have to download a tar.gz file called collectionandqueries, or something like that.

Let me know if you can't find it.

okhat commented 2 years ago

wget https://msmarco.blob.core.windows.net/msmarcoranking/collectionandqueries.tar.gz

This should work.

That said, you might find the Jupyter notebook on the new_api code easier to work with. It contains a model checkpoint for ColBERTv2 also.

cuongdinh2021 commented 1 year ago

Hi Okhat, I ran into the same issue. In the link you sent, there are queries.dev.small.tsv and queries.eval.small.tsv but no queries.train.small.tsv file. Should queries.train.tsv be used instead of queries.train.small.tsv then?