Closed pommedeterresautee closed 4 years ago
Hi, Thank you for the fast interest!
I will present a tech report at TREC next week and make it available then. (Including updating the readme of this repo)
As a teaser: TK is 50 times faster than BERT-base and substantially better than CONV-KNRM across multiple collections. And TK is analyzable by design.
Best, Sebastian
great teaser indeed! Can't wait to check. Just a last question before waiting for 1 week for more info, 50 times faster than Bert still mean 1-2 secs per query, right?
Just an advice btw FWIW, conv-knrm is supposed to be on par with Bert on ad hoc search (a different task by nature than msmarco where contextual understanding of words makes sense). May be in the future you want to add a benchmark on ad hoc search too. Here for more info on Bert Vs CKNRM: https://arxiv.org/pdf/1904.07531.pdf
Hi, want to let you know that I am very impatient, I tried TK V1 on my ad hoc search logs (I work for a legal publisher). Results are similar to those I get from CKNRM (almost the same score). I waited for 2 epochs (no more improvements). I have seen in your config file that you use TK V6, so I will wait to check :-) (the code of TK V6 is not yet available in the repo). However perf are much better to what I was expecting (for inference on a recent GPU)
That's very interesting - are the neural models better than bm25? in general there are many things we can do to tune the models, so for example how long are the documents/passages you are using?
Yep they are both 10 points better than a simple Bm25 on P@5. However I rerank only top 20 results (1 page of SERP), and we think our in prod system is far from being optimized (lots of SOLR boosters on some words not needed, etc.). So BM25 score is BM25 applied to the 20 docs of each SERP (it gives better results that our prod sys). I can also say that other reranker don't reach those results, they are not even on par with Bm25. I use raw clicks. I added a little modif to both models to modelize bias position. It gives a little boost to both models but is of no need on MsMarco (AFAI it s already debiased by a click model).
I tried different snippet size. Best results with 5 words on left and 5 on right of each matching word. Measures are slightly lower for other values, and very low when using only matching words (important to check if expected behavior happens). Full title is always used (I use 2 text fields, signals are separated for best perf, which required a little change in your model of course).
Hi, Sorry for answering so late. I uploaded our TREC tech report on arxiv now (https://arxiv.org/abs/1912.01385). I would try a bigger re-ranking depth than 20 (in core_metrics.py there are methods to automatically evaluate all possible re-ranking depths at once with numpy) and also maybe you could try full documents instead of snippets (there is a tk_v2 in tk.py which uses windowed kernel pooling for longer document inputs, and it did pretty well in the TREC document ranking task)
Hi,
Just discovered your code. Seems super interesting, in particular the TK model.
I am wondering if there is a paper on TK model? May be some scores and info on speed? (I have found nothing on the MsMarco leaderboard).
My main question: is it much more powerful than CKNRM and light enough to be usable in real scenario (not taking minutes to rerank candidates).