hurshprasad/active-learning-elo-letor

Project for UCL Information Retrieval 2016 Learning to Rank (LETOR)

Implementation of Active Learning for Ranking through Expected Loss Optimization

We are comparing two LETOR models adaRank, and LambdaMart, and then observing another approach to LETOR called ELO Active Learning.

How to Run the code

Active Learning (ELO)

Download pre-processed data from this link file called ELO_ACTIVE_LEARNING_PRE_PROCESSED_DATA.zip
place the zip file under your project directory under data/MQ2016/active_learning/pre_processed/
uses python sklearn machine learning library, numpy, and cPickle
run the python file active_learning/elo_active_learning.py
we used pycharm (which setup the module paths) screen shot gif

RankLib Models (AdaRank, LambdaMart)

The following runs AdaRank on our dataset, change -ranker to 6 to run LambdaMart

$ java -jar bin/RankLib.jar -train ../data/MQ2016/base1024/Fold1/train.txt -test ../data/MQ2016/active_learning/test.txt -validate ../data/MQ2016/base1024/Fold1/vali.txt -ranker 3 -metric2t DCG@10

Data

data is MQ2007
Segmented into following folders representing record sizes 2^[9 10 11 12 13 14 15 15] for NDCG@10 comparison to ELO Active Learning
- base512
- base1024
- base2048
- base4096
- base8192
- base16384
- base32768
- base65536
Data Description (further reading)

Folds	Training Set	Validation Set	Test Set
Fold1	{S1,S2,S3}	S4	S5
Fold2	{S2,S3,S4}	S5	S1
Fold3	{S3,S4,S5}	S1	S2
Fold4	{S4,S5,S1}	S2	S3
Fold5	{S5,S1,S2}	S3	S4

Frameworks

RankLib
Add ranklib/bin/RankLib.jar to CLASSPATH
Command Line Parameters
Runing RankLib from command line or terminal
$ java -jar bin/RankLib.jar -train ../data/MQ2008/Fold1/train.txt -test ../data/MQ2008/Fold1/test.txt -validate ../data/MQ2008/Fold1/vali.txt -ranker 6 -metric2t NDCG@10 -metric2T ERR@10 -save mymodel.txt
Letor Framework
$ git clone https://bitbucket.org/ilps/lerot.git
$ cd lerot
$ pip install -r requirements.txt

Folder Structure

.  
├── data 
|    ├── MQ2016                         # segmented MQ2007 data
|    │   ├── S1.txt
|    │   ├── S2.txt
|    │   ├── S3.txt
|    │   ├── S4.txt
|    │   ├── S5.txt
|    │   ├── base512                    # segmented data
|    │   ├── base1024
|    │   ├── base2048
|    │   ├── base4096
|    │   ├── base8192
|    │   ├── base16384
|    │   ├── base32768
|    │   ├── base65536
|    │   └── active_learning/*      # all pre-processed data
├── literature  
├── poster  
├── ranklib  
├── report  
├── results
└── active_learning                 # source code for active learning
    ├── __init__.py
    ├── constants.py  
    ├── elo_active_learning.py
    ├── pre_processing.py
    └── util.py

hurshprasad / active-learning-elo-letor