Closed MarcosFP97 closed 4 years ago
I want to complete my previous message with an example. I would like to test something like this:
[CLS]QUERTY[SEP]POSSIBLE_REL_PASSAGE
Where query could be a question like "Vitamin D does not cure COVID-19" and the passage any possible relevant or non-relevant one to the given query. The problem is that I am not sure how to combine this with your pretrained BERT model in Tensorflow. So, I would appreciate any help.
Hi @MarcosFP97, it is possible that the model trained on MS MARCO generalizes to other domains. Many people used variants of this BERT model on COVID-related questions and it works without any domain-specific fine-tuning. In particular, I would like to suggest these two papers, whose models are public:
1) They used SciBERT trained on MS MARCO and it worked quite well on TREC-COVID, a competition to search for COVID-related papers: https://arxiv.org/pdf/2010.05987.pdf
2) My team and I got first or second places in 4 out of 5 rounds on TREC-COVID competition using a "better" BERT (called monoT5) trained on MS MARCO: https://arxiv.org/pdf/2007.07846.pdf
For this last one (monoT5), code and model are available at https://github.com/castorini/pygaggle/ Our experience is that this model works "off-the-shelf" in various domains without any finetuning.
Please let me know what you think.
Hi @rodrigonogueira4, thank you for your quick answer. I think that I am going to use the monoT5 model fine tuned with MS MARCO, since it seems very easy to use and to combine with Pyserini. Thank you again
You are welcome!
Hi! First of all, thank you for your outstanding work. I have the following question: I would like to test your fine-tuned MS MARCO model over a complete new collection. I want to check if a given passage is relevant to a concrete question. How can I do this? I would appreciate any help. Marcos