Evaluation scripts - Githubissues

JoaoLages commented 7 years ago

While trying to evaluate the model using bin/evals.sh I get the following RuntimeWarning:

Traceback (most recent calls WITHOUT Sacred internals):
  File "/home/joaolages/Desktop/repacrr/evals/docpairs.py", line 130, in main
    best_pred_dir, argmax_epoch, argmax_run, argmax_ndcg, argmax_err = get_epoch_from_val(pred_dirs, val_dirs)
  File "/home/joaolages/Desktop/repacrr/utils/eval_utils.py", line 67, in get_epoch_from_val
    argmax_run, argmax_ndcg, argmax_err = test_epoch_ndcg_err[best_epoch]
KeyError: 0

Which leads to the following error:

Traceback (most recent calls WITHOUT Sacred internals):
  File "/home/joaolages/Desktop/repacrr/evals/rerank.py", line 180, in main
    best_pred_dir, argmax_epoch, argmax_run, argmax_ndcg, argmax_err = get_epoch_from_val(pred_dirs, val_dirs)
  File "/home/joaolages/Desktop/repacrr/utils/eval_utils.py", line 67, in get_epoch_from_val
    argmax_run, argmax_ndcg, argmax_err = test_epoch_ndcg_err[best_epoch]
KeyError: 0

I believe that the problem in here is due to the train_test_years variable set in utils/config.py as

train_test_years = {'wt12_13':['wt11', 'wt14']}

I have trained and predicted the model with wt09_10 and wt11 respectively, which I have set on both bash scripts. Both the functions docpairs.py and rerank.py don't look into these train_years and test_year variable passed in the config.

andrewyates commented 7 years ago

If I understand correctly, the issue is that docpairs.py and rerank.py are using the hardcoded train_test_years rather than config parameters. A hack that should temporarily fix the issue would be editing train_test_years in utils/config.py (but based on your comments, I'm guessing you already realize that). The error comes from the fact that the eval scripts are looking for the predictions in an empty directory.

Those scripts ignore the train_years and test_year config parameters because we generally have multiple test_year values for each train_years value (i.e., we use 4 years for training and predict on both of the remaining two years; one of the remaining two years is used for testing and the other for validation).

The current code hardcodes the parameter values rather than expecting the eval scripts to be run multiple times with different values for train_years and test_year. This isn't ideal though, and I'll talk with @khui to see if we can find a better solution.

I'm glad to see that you're uncovering issues with our pipeline. Thanks for pointing it out!

JoaoLages commented 7 years ago

I'm glad to see that you're uncovering issues with our pipeline. Thanks for pointing it out!

You're welcome. Glad I can help such a good open source project

khui commented 7 years ago

Thanks for pointing that out Joao. As suggested by Andrew, the train_test_years dictionary needs to be edited according to the actually training/test year being used. In your case, you may write: {‘wt09_10’:[‘wt11’]}

However, by doing this, you wont have a validation data, thereby could not be properly evaluated.

{‘wt09_10’:[‘wt11’, ‘wt12’]}

The evaluation actually relies on train/validation/test split. At training and predication phases, one train on certain years (wt09 and wt10), and predict over different iterations on certain test years (wt11 or wt12). At the evaluation phase, given the training years (wt09 and wt10), one needs to specify two predicted years, one for validation and one for testing, to conduction evaluation. For example, when evaluating on wt11 and validating on wt12, the model is selected based on wt11 and the evaluation results are based on wt12.

At this moment, I would suggest you could directly configure the train_test_years.

khui commented 7 years ago

According to Joao, the problem is fixed after editing train_test_years.

khui / copacrr

Evaluation scripts #2