Open Iriseve opened 1 month ago
Maybe a tips for the corpus, in the Citation part, author mentioned that corpus for other dataset was the same as Dense Passage Retrieval & another paper, you can try to search DPR in github, which provide downloads for it's passage corpus. (But not sure is the same)
Hi, I find that the retrieval corpus used by hotpotqa and other datasets mentioned in the paper seems different? I have obtained the pre-processed corpus of wikipedia2017 from other issues.
May I ask what is the difference between the retrieval corpus used by other datasets and this corpus, and whether relevant data can be provided?
Besides, when I use
python ColBERT/index.py
to index enwiki-20171001-pages-meta-current-withlinks-abstracts.tsv, it seems to take long time?It's been a few hours since it started the first iteration. However, the occupancy rate and utilization rate of GPU memory are very low, and it seems to be stuck here. Are there any solutions? Thank you very much!