How much training, how long?

Hi Craig, Sorry for the late reply. Thanks for your interest in the TKL paper :)

For TKL we used the following helper script https://github.com/sebastian-hofstaetter/transformer-kernel-ranking/blob/master/matchmaker/preprocessing/convert_formats/msmarco_doc_create_train_input.py to generate triples with bm25-top100 negative sampling from the msmarco-document collection files referenced in the TREC'19 repository. We set the maximum to 5 million triples and randomly shuffled them. That means that in one batch only one relevant and one non-relevant document per query appear for the pairwise margin ranking loss.

We used early stopping based on the validation set ndcg@10, that was checked every 4000 batches. The validation set consists of re-ranking top100 bm25 results from 5000 queries carved out of the training queries with: https://github.com/sebastian-hofstaetter/transformer-kernel-ranking/blob/master/matchmaker/preprocessing/convert_formats/msmarco_doc_split_train_validation.py.

For the result table test results we used for TREC-dense the provided top100 bm25 (from the re-ranking subtask) and the docdev query set for the sparse msmarco labels.

All bm25 results are generated with the default Anserini settings.

Best, Sebastian

sebastian-hofstaetter / matchmaker

How much training, how long? #5