This repository contains all the necessary code for running the retrieval pipeline. However unfortunately the codebase is not fully homogenous and instead consists of multiple parts. Main runner is available in the src folder while more complex index builders and advanced methods are available in the submission_notebooks
folder.
Main program is able to do following things:
Main program is not able to do following things:
Results, prebuild Docker images and evaluations can be found from this OneDrive folder.
Before program can be build certain steps must executed:
Pipfile
and by running pipenv install
to update the Pipfile.lock
correspondingly.src/constants.py
and changing METHOD_NAME
variable. As seen in the main.py
following methods are available:
If index data is available building Docker image should be simple. You can build the image by running docker build ./ -t touche-2022
. Alternatively you can build the image by running build_docker.sh
script, which produces the Docker image and packs it into tar.gz file.
Now the Docker image can be run in the batch mode by running ./run_docker.sh -i $inputDataset -o $outputDir
. Non batch mode can be run by using Dockers conventional running commands, but then you also have to take care of proper directory mounting. You may use contents of the run_docker.sh
file as your reference.
Program can be also run locally. Before the program is able to start locally, some Python libraries must be installed. This can be done by running ./install_local.sh
. After that the program can be simply run by running ./run_local.sh
. If you want to program in the batch mode locally, this can be done using command ./run_local.sh -i $inputDataset -o $outputDir
Program is separated into several modules that are visible in the src folder. Bellow quick overview of different modules:
main.py
provides menu functionality and hosts setup for different batch modesconstants.py
provides main configuration file for programs most important hardcoded valuestest.py
provides example how to manually run different parts of the programIn case our work was helpful for you, feel free to cite this work!
@inproceedings{RanaEtAl:CLEF2022,
title = {LEVIRANK: Limited Query Expansion with Voting Integration for Document Retrieval and Ranking},
author = {Ashish Rana and Pujit Golchha and Roni Juntunen and Andreea Coajă and Ahmed Elzamarany and Chia-Chien Hung and Simone Paolo Ponzetto},
pages = {3074--3089},
url = {http://ceur-ws.org/Vol-3180/#paper-259},
crossref = {CLEF2022},
}