Closed Jia-py closed 11 months ago
Hi @Jia-py -- thanks for reporting.
Terrier indexes have a maximum length for the fields that they store, which includes the docno
. The default of 20 is often enough, but some datasets (such as touche) have longer docnos.
To change the maximum length, you'll need to set meta={"docno": 39}
when indexing, as follows (the maximum docno is 39 characters in the dataset):
indexer = pt.IterDictIndexer('./indices/beir_webis-touche2020_v2', meta={"docno": 39})
I hope this helps!
Hi @seanmacavaney, thanks for your time and help! It works well now.
No problem, happy to help :)
Describe the bug
all the metrics are zero using bm25 in webis-touche2020 dataset, the same code works good for other datasets, such as beir/TREC-COVID.
To Reproduce Steps to reproduce the behavior: