terrierteam / terrier-prf

3 stars 0 forks source link

Set the number of documents and expansion terms #2

Closed kite1988 closed 3 months ago

kite1988 commented 3 years ago

Hi,

Thanks for supporting additional pseudo-relevance feedback in Terrier!

I have a question about setting the number of documents and expansion terms. I tried to set "qe_fb_docs" and "qe_fb_terms" (see below) but they do not work, as Terrier still analyzes 3 documents (the default one).

bin/terrier batchretrieve -w BM25 -c rm3:on qe_fb_terms=5 qe_fb_docs=10 -P org.terrier:terrier-prf -t ${query_file}

What's the right way of setting these two parameters? Thank you!

cmacdonald commented 3 years ago

https://github.com/terrier-org/terrier-core/blob/5.x/modules/core/src/main/java/org/terrier/querying/QueryExpansion.java#L76

        public static final String CONTROL_EXP_DOCS = "qe_fb_docs";
        public static final String CONTROL_EXP_TERMS = "qe_fb_terms";

Those are the correct controls. I think its the command line syntax - can you try -c rm3:on -c qe_fb_terms:5 -c qe_fb_docs:10.

Have you tried using PyTerrier - we think its easier to use.

kite1988 commented 3 years ago

Tried your suggested command, but the log still says 3 documents are used for RM3.

10:46:32.175 [main] INFO org.terrier.querying.LocalManager - running process RM3 10:46:32.176 [main] INFO org.terrier.querying.RM1 - Analysing 3 feedback documents

For my previous command "bin/terrier batchretrieve -w BM25 -c rm3:on qe_fb_terms=5 qe_fb_docs=10 -P org.terrier:terrier-prf -t ${query_file}", I double checked the setting file for the result file ("xx.res.settings"). It shows that the qe_fb_docs and qe_fb_terms are actually set as what I specified (see the below snippet).

# control: qe_fb_terms=10 # control: wmodel=BM25 # control: rm3=on # control: qe_fb_docs=5

I have not tried using PyTerrier yet.

cmacdonald commented 3 years ago

Hi @kite1988 This incantation works: bin/terrier batchretrieve -w BM25 -c rm3:on -Dexpansion.documents=20 -Dexpansion.documents=100

The problem is that https://github.com/terrierteam/terrier-prf/blob/master/src/main/java/org/terrier/querying/RM1.java does not use QueryExpansionConfig (see https://github.com/terrier-org/terrier-core/blob/5.x/modules/core/src/main/java/org/terrier/querying/QueryExpansion.java#L91)

Pull Requests for documentation or the code fixes would be gratefully received.

kite1988 commented 3 years ago

It works! Thank you so much!