Testing over a complete new dataset

MarcosFP97 commented 4 years ago

Hi! First of all, thank you for your outstanding work. I have the following question: I would like to test your fine-tuned MS MARCO model over a complete new collection. I want to check if a given passage is relevant to a concrete question. How can I do this? I would appreciate any help. Marcos

MarcosFP97 commented 4 years ago

I want to complete my previous message with an example. I would like to test something like this:

[CLS]QUERTY[SEP]POSSIBLE_REL_PASSAGE

Where query could be a question like "Vitamin D does not cure COVID-19" and the passage any possible relevant or non-relevant one to the given query. The problem is that I am not sure how to combine this with your pretrained BERT model in Tensorflow. So, I would appreciate any help.

rodrigonogueira4 commented 4 years ago

Hi @MarcosFP97, it is possible that the model trained on MS MARCO generalizes to other domains. Many people used variants of this BERT model on COVID-related questions and it works without any domain-specific fine-tuning. In particular, I would like to suggest these two papers, whose models are public:

1) They used SciBERT trained on MS MARCO and it worked quite well on TREC-COVID, a competition to search for COVID-related papers: https://arxiv.org/pdf/2010.05987.pdf

2) My team and I got first or second places in 4 out of 5 rounds on TREC-COVID competition using a "better" BERT (called monoT5) trained on MS MARCO: https://arxiv.org/pdf/2007.07846.pdf

For this last one (monoT5), code and model are available at https://github.com/castorini/pygaggle/ Our experience is that this model works "off-the-shelf" in various domains without any finetuning.

Please let me know what you think.

MarcosFP97 commented 4 years ago

Hi @rodrigonogueira4, thank you for your quick answer. I think that I am going to use the monoT5 model fine tuned with MS MARCO, since it seems very easy to use and to combine with Pyserini. Thank you again

rodrigonogueira4 commented 4 years ago

You are welcome!

nyu-dl / dl4marco-bert

Testing over a complete new dataset #44