Open eyasu11321238a opened 2 months ago
This should do the trick!
pt.BatchRetrieve(index_path, wmodel="DirichletLM", controls={'dirichletlm.mu': 2000})
We know the control names are not particularly well-documented, but it's something we have an open issue for :-) https://github.com/terrier-org/terrier-core/issues/197
Hey Sean, is it also possible to change the query term probability function in the DirichletLM weighting model? We'd like to study different smoothing strategies.
There is a very easy way of writing your own weighting model: https://pyterrier.readthedocs.io/en/latest/terrier-retrieval.html#custom-weighting-models where you pass a lambda function to BatchRetrieve constructor.
However, its very slow (it has to be cross the JNI boundary for every posting scored).
Craig
That looks great, thank you!
Otherwise, if you can compile your own weighting model in Java, you can add it to the classpath.
For instance, there is a BM25_log10_nonum weighting model in https://github.com/terrierteam/terrier-ciff. It can be used directly like this:
pt.init(packages=["com.github.terrierteam:terrier-ciff:-SNAPSHOT"])
br = pt.BatchRetrieve(index, wmodel="BM25_log10_nonum")
# or, if the fully qualified name was different
# br = pt.BatchRetrieve(index, wmodel="org.terrier.matching.models.BM25_log10_nonum")
(where com.github is an automatic Github to Maven gateway provided built by jitpack).
Alternatively, if you mvn install
the package locally, and then it would be available with pt.init(packages=['org.terrier:terrier-ciff:0.2'])
. The Jitpack integration is just handy for importing something in github without needing to formally release to Maven.
Craig
I am conducting an IR experiment using the Dirichlet model, and I need help improving the result using a smoothing parameter.
Dirichlet = pt.BatchRetrieve(index_path, wmodel="DirichletLM", c=2000) #controls={"mu": 2000}
i use smoothing parameter, but still the result is the same.