LTR approach - Githubissues

This PR adds a start of our learning-to-rank (LTR) baseline. It includes the following:

A Random Forest Regressor model
A method for writing TREC results in the format required by trec_eval
A custom scorer for scikit-learn that implements the NDCG scorer and takes the different queries in the dataset into account
A runner for the LTR methods

A grid search is used to find the best parameters for the RandomForestRegressor with respect to the NDCG@20 metric. Each run, a random selection of 20 queries is taken. By averaging the NDCG scores of the different runs (at cutoffs 5, 10, 15, 20), most of the queries will be included in a test set. Results are written to a file in the results folder so that they can also be verified using trec_eval. Scores from the ndcg_scorer seem to match the results generated by trec_eval.

mcomans / IN4325-table-retrieval

LTR approach #8