softgitron / LeviRank

Touché 2022 competion project
MIT License
0 stars 2 forks source link

About this repository

License: MIT

This repository contains all the necessary code for running the retrieval pipeline. However unfortunately the codebase is not fully homogenous and instead consists of multiple parts. Main runner is available in the src folder while more complex index builders and advanced methods are available in the submission_notebooks folder.

Main program is able to do following things:

Main program is not able to do following things:

Results, prebuild Docker images and evaluations can be found from this OneDrive folder.

Preparing to build the program

Before program can be build certain steps must executed:

  1. Pipenv, pyenv and (Docker) should be installed.
  2. If using premade indexes and models, those must be loaded from the OneDrive from the data folder to the data folder in the root directory.
  3. Selecting which version of the program to build. Due to library incompatibility only either pyterrier or pyserini indexers can be build in. Selection can be made by commenting / uncommenting respective libraries in the Pipfile and by running pipenv install to update the Pipfile.lock correspondingly.
  4. If program is used in batch mode, batch mode must be chosen by altering src/constants.py and changing METHOD_NAME variable. As seen in the main.py following methods are available:
    • BM25
    • BM25_with_mono_t5
    • BM25_with_duo_t5_and_advanced_expander
    • levirank_baseline_large_duo_t5
    • levirank_dense_vote_initial_retrieval
    • levirank_dense_initial_retrieval

Building Docker image

If index data is available building Docker image should be simple. You can build the image by running docker build ./ -t touche-2022. Alternatively you can build the image by running build_docker.sh script, which produces the Docker image and packs it into tar.gz file.

Running the Docker image

Now the Docker image can be run in the batch mode by running ./run_docker.sh -i $inputDataset -o $outputDir. Non batch mode can be run by using Dockers conventional running commands, but then you also have to take care of proper directory mounting. You may use contents of the run_docker.sh file as your reference.

Running the program locally

Program can be also run locally. Before the program is able to start locally, some Python libraries must be installed. This can be done by running ./install_local.sh. After that the program can be simply run by running ./run_local.sh. If you want to program in the batch mode locally, this can be done using command ./run_local.sh -i $inputDataset -o $outputDir

About the architecture

Program is separated into several modules that are visible in the src folder. Bellow quick overview of different modules:

Citing our Work

In case our work was helpful for you, feel free to cite this work!

@inproceedings{RanaEtAl:CLEF2022,
title = {LEVIRANK: Limited Query Expansion with Voting Integration for Document Retrieval and Ranking},
author = {Ashish Rana and Pujit Golchha and Roni Juntunen and Andreea Coajă and Ahmed Elzamarany and Chia-Chien Hung and Simone Paolo Ponzetto},
pages = {3074--3089},
url = {http://ceur-ws.org/Vol-3180/#paper-259},
crossref = {CLEF2022},
}