mcomans / IN4325-table-retrieval

0 stars 0 forks source link

LTR approach #8

Closed doudejans closed 4 years ago

doudejans commented 4 years ago

This PR adds a start of our learning-to-rank (LTR) baseline. It includes the following:

A grid search is used to find the best parameters for the RandomForestRegressor with respect to the NDCG@20 metric. Each run, a random selection of 20 queries is taken. By averaging the NDCG scores of the different runs (at cutoffs 5, 10, 15, 20), most of the queries will be included in a test set. Results are written to a file in the results folder so that they can also be verified using trec_eval. Scores from the ndcg_scorer seem to match the results generated by trec_eval.

doudejans commented 4 years ago

I have removed GridSearchCV in favour of a simple implementation of a regular grid search for the best parameters because I figured out that it didn't make sense to do cross validation in the way I did it. It is nearly impossible to get the respective query information of a cross validation partition at scoring time when using GridSearchCV. Working around this will take too much time.