Open shamanez opened 2 years ago
As you saw in the paper, the evidence blocks are frozen during fine-tuning, which means that index updates are not performed in this time. Therefore, if domain specific QA is the case, we would have to firstly pre-train REALM to get domain specific evidence blocks (retriever), then we can further fine-tine on a given dataset.
So we have to pre train the REALM with masked word prediction task right?
On Sat, 29 Jan 2022 at 5:42 PM, Li-Huai (Allan) Lin < @.***> wrote:
As you saw in the paper, the evidence blocks are frozen during fine-tuning, which means that index updates are not performed in this time. Therefore, if domain specific QA is the case, we would have to firstly pre-train REALM to get domain specific evidence blocks, then we can further fine-tine on a given dataset.
— Reply to this email directly, view it on GitHub https://github.com/qqaatw/pytorch-realm-orqa/issues/4#issuecomment-1024832831, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEA4FGQFWTPDML34RREGCQTUYNVZXANCNFSM5NCKNTHQ . You are receiving this because you authored the thread.Message ID: @.***>
-- [image: Augmented Human Lab] http://www.ahlab.org/ [image: uni] https://www.auckland.ac.nz/en/abi.html
Gayal Shamane Ph.D. Candidate Augmented Human Lab Auckland Bioengineering Institute | The University of Auckland
Exactly, but the pre-training part has not been fully ported to PyTorch, especially asynchronous MIPS refreshes, and Inverse Cloze Task (ICT), which is used to warm-start retriever training. Thus, to pre-train REALM, we would have to utilize the original TF impl., and then can fine-tune it on PyTorch.
Thanks a lot for your insight. Anyways this end-to-end fine-tuning will be very expensive.
@qqaatw is it part of the roadmap to port the pre-training part to Pytorch?
@qqaatw is it part of the roadmap to port the pre-training part to Pytorch?
It was part of the roadmap, but now I'm thinking whether this is worth to port.
You can see the configuration of their experiments:
Pre-training We pre-train for 200k steps on 64 Google Cloud TPUs, with a batch size of 512 and a learning rate of 3e-5, using BERT’s default optimizer. The document embedding step for the MIPS index is parallelized over 16 TPUs. For each example, we retrieve and marginalize over 8 candidate documents, including the null document ∅
which leveraged an array of resources and is extremely expensive for normal users and researchers. I don't have such resources and a regular deep learning workstation will not be able to reproduce similar results like that of them I think.
@qqaatw "It was part of the roadmap, but now I'm thinking whether this is worth port." Yeah, this seems a problem and I agree.
The REALM paper highlights that for downstream tasks they kept the retriever frozen. What about a task like domain-specific open domain question answering? In that kind of a scenario can we train the entire REALM with this code.
if yes: we might able to compare results with RAG-end2end
https://github.com/huggingface/transformers/tree/master/examples/research_projects/rag-end2end-retriever