osirrc / terrier-docker

OSIRRC Docker Image for Terrier
http://terrier.org/
2 stars 1 forks source link

OSIRRC Docker Image for Terrier

Build Status Docker Cloud Build Status DOI

Arthur Câmara and Craig Macdonald

This is the docker image for the Terrier toolkit (v5.2) conforming to the OSIRRC jig for the Open-Source IR Replicability Challenge (OSIRRC) at SIGIR 2019. This image is available on Docker Hub

Quick Start

The following jig command can be used to index TREC disks 4/5 for robust04:

python run.py prepare --repo osirrc2019/terrier --tag vx.y.z --collections robust04=/tmp/disk45/=trectext

The following jig command can be used to perform a retrieval run on the collection with the robust04 test collection, using BM25 as ranker:

python run.py search  \
    --repo osirrc2019/terrier \
    --tag vx.y.z \
    --collection robust04 \
    --topic topics/topics.robust04.txt \
    --qrels qrels/qrels.robust04.txt\
    --output /tmp/runs

Retrieval Methods:

This image supports the following weighting models: BM25 (bm25), PL2 (pl2) and DPH (dph).

Additionally, it supports Query Expansion and Proximity-based (DFRD) search, by including qe, prox or prox_qe to the --opts config argument: --opts config=<retrieval_model>_<extra>:

(BM25)

python run.py search  --repo osirrc2019/terrier --tag vx.y.z --collection robust04  --topic topics/topics.robust04.txt --qrels qrels/qrels.robust04.txt   --output /tmp/runs --opts config=bm25

(BM25 + query expansion)

python run.py search  --repo osirrc2019/terrier --tag vx.y.z --collection robust04  --topic topics/topics.robust04.txt --qrels qrels/qrels.robust04.txt   --output /tmp/runs --opts config=bm25_qe

(BM25 + Proximity)

python run.py search  --repo osirrc2019/terrier --tag vx.y.z --collection robust04  --topic topics/topics.robust04.txt --qrels qrels/qrels.robust04.txt   --output /tmp/runs --opts config=bm25_prox

(BM25 + Proximity + query expansion)

python run.py search  --repo osirrc2019/terrier --tag vx.y.z --collection robust04  --topic topics/topics.robust04.txt --qrels qrels/qrels.robust04.txt   --output /tmp/runs --opts config=bm25_prox_qe

(PL2)

python run.py search  --repo osirrc2019/terrier --tag vx.y.z --collection robust04  --topic topics/topics.robust04.txt --qrels qrels/qrels.robust04.txt   --output /tmp/runs --opts config=pl2

(PL2 + query expansion)

python run.py search  --repo osirrc2019/terrier --tag vx.y.z --collection robust04  --topic topics/topics.robust04.txt --qrels qrels/qrels.robust04.txt   --output /tmp/runs --opts config=pl2_qe

(PL2 + Proximity)

python run.py search  --repo osirrc2019/terrier --tag vx.y.z --collection robust04  --topic topics/topics.robust04.txt --qrels qrels/qrels.robust04.txt   --output /tmp/runs --opts config=pl2_prox

(PL2 + Proximity + query expansion)

python run.py search  --repo osirrc2019/terrier --tag vx.y.z --collection robust04  --topic topics/topics.robust04.txt --qrels qrels/qrels.robust04.txt   --output /tmp/runs --opts config=pl2_prox_qe

(DPH)

python run.py search  --repo osirrc2019/terrier --tag vx.y.z --collection robust04  --topic topics/topics.robust04.txt --qrels qrels/qrels.robust04.txt   --output /tmp/runs --opts config=dph

(DPH + query expansion)

python run.py search  --repo osirrc2019/terrier --tag vx.y.z --collection robust04  --topic topics/topics.robust04.txt --qrels qrels/qrels.robust04.txt   --output /tmp/runs --opts config=dph_qe

(DPH + Proximity)

python run.py search  --repo osirrc2019/terrier --tag vx.y.z --collection robust04  --topic topics/topics.robust04.txt --qrels qrels/qrels.robust04.txt   --output /tmp/runs --opts config=dph_prox

(DPH + Proximity + query expansion)

python run.py search  --repo osirrc2019/terrier --tag vx.y.z --collection robust04  --topic topics/topics.robust04.txt --qrels qrels/qrels.robust04.txt   --output /tmp/runs --opts config=dph_prox_qe

NOTE: for running DFRD (Proximity-based model), the index must be build using the --opts=block.indexing=true param

Learning to Rank Runs

Learning-to-rank will typically require that the index has more information, e.g. fields or blocks.

Indexing:

python run.py prepare     --repo osirrc2019/terrier --tag vx.y.z   --collections robust04=/tmp/disk45/=trectext --opts "FieldTags.process=HEADLINE"

Training:

You need to specify the features to be used by Terrier - see http://terrier.org/docs/v5.1/learning.html for more information about Terrier feature definitions.

python run.py train  --repo osirrc2019/terrier --tag vx.y.z --collection robust04  --topic topics/topics.robust04.txt --qrels qrels/qrels.robust04.txt    --test_split $PWD/sample_training_validation_query_ids/robust04_test.txt  --validation_split $PWD/sample_training_validation_query_ids/robust04_validation.txt --model_folder /tmp/runs --opts features="SAMPLE;WMODEL:SingleFieldModel(BM25,0);QI:SingleFieldModel(Dl,0)"

Retrieval:

You will need to specify the bm25_ltr_jforest configuration.

python run.py search  --repo osirrc2019/terrier --tag vx.y.z --collection robust04  --topic topics/topics.robust04.txt --qrels qrels/qrels.robust04.txt   --output /tmp/runs --opts config=bm25_ltr_jforest

Expected Results

robust04

MAP BM25 +QE +Prox +Prox + QE DPH + QE +Prox +Prox +QE PL2 +QE
TREC 2004 Robust Track Topics 0.2363 0.2762 0.2404 0.2781 0.2479 0.2821 0.2501 0.2869 0.2241 0.2538

core18

MAP BM25 +QE +Prox +Prox + QE DPH + QE +Prox +Prox +QE PL2 +QE
TREC 2018 Common Core Track Topics 0.2326 0.2975 0.2369 0.2960 0.2427 0.3055 0.2428 0.3035 0.2225 0.2787

GOV2

MAP BM25 +QE +Prox +Prox + QE DPH + QE +Prox +Prox +QE PL2 +QE
TREC 2004 Terabyte Track: Topics 701-750 0.2461 0.2621 0.2537 0.2715 0.2804 0.3120 0.2834 0.3064 0.2334 0.2478
TREC 2005 Terabyte Track: Topics 751-800 0.3081 0.3506 0.3126 0.3507 0.3311 0.3754 0.3255 0.3095 0.2884 0.3160
TREC 2006 Terabyte Track: Topics 801-850 0.2629 0.3118 0.2724 0.3085 0.2917 0.3494 0.2904 0.3288 0.2363 0.2739

Interact hooks

This image also supports the interact hooks from the OSIRRC JIG. After initializing the image (with python run.py prepare):

python run.py interact --repo terrier --tag vx.y.z

The following (internal) ports will be made available:

$ /bin/terrier interactive -I http://dockerhost:1981/
terrier query> information retrieval end:5
  Displaying 1-6 results
0 FBIS4-20699 10.268754805435458
1 FBIS4-20702 9.768490153503198
2 FR941027-2-00046 9.491347902606723
3 FBIS4-20701 9.456022500508775
4 FBIS3-24510 9.31403481019499
5 FBIS4-20700 8.79234249484928

NOTE Currently, the JIG may redirect these ports to diverse ones in the host machine. This is due to the way that Docker deals with port assignment. Therefore, you should check the correct port assignment by running:

$> docker ps

and check the right port assignment under the PORTS column.

Reviews