Closed jinzhuoran closed 1 year ago
Hi and thanks for the interest. I cannot distribute the collections, since most of them are under a license from TREC. please contact the trec cast organizers to acquire them.
to construct the collection_mapping you can use the code here:
https://github.com/littlewine/ZeCo2/blob/main/preprocessing/preprocessing_docids_to_int.py
it simply turns string docids to integers to be compatible with the colbert code, and keeps a mapping (then you have to provide the collection mapping paths of each collection to paths.py
)
Hi, @littlewine. I read your paper with great interest, and I would like to keep up with your work. Could you give me the download link for CAR, WAPO and KILT corpus and the construction method of collection_mapping? Thanks for your interesting work!