Closed doudejans closed 4 years ago
I have removed GridSearchCV
in favour of a simple implementation of a regular grid search for the best parameters because I figured out that it didn't make sense to do cross validation in the way I did it. It is nearly impossible to get the respective query information of a cross validation partition at scoring time when using GridSearchCV
. Working around this will take too much time.
This PR adds a start of our learning-to-rank (LTR) baseline. It includes the following:
trec_eval
scikit-learn
that implements the NDCG scorer and takes the different queries in the dataset into accountA grid search is used to find the best parameters for the
RandomForestRegressor
with respect to the NDCG@20 metric. Each run, a random selection of 20 queries is taken. By averaging the NDCG scores of the different runs (at cutoffs 5, 10, 15, 20), most of the queries will be included in a test set. Results are written to a file in theresults
folder so that they can also be verified usingtrec_eval
. Scores from thendcg_scorer
seem to match the results generated bytrec_eval
.