Does the code supports for the entire end-to-end fine-tuning including the retriever ?

qqaatw / pytorch-realm-orqa

PyTorch reimplementation of REALM and ORQA

Apache License 2.0

22 stars 2 forks source link

Does the code supports for the entire end-to-end fine-tuning including the retriever ? #4

Open shamanez opened 2 years ago

shamanez commented 2 years ago

The REALM paper highlights that for downstream tasks they kept the retriever frozen. What about a task like domain-specific open domain question answering? In that kind of a scenario can we train the entire REALM with this code.

if yes: we might able to compare results with RAG-end2end

https://github.com/huggingface/transformers/tree/master/examples/research_projects/rag-end2end-retriever

qqaatw commented 2 years ago

As you saw in the paper, the evidence blocks are frozen during fine-tuning, which means that index updates are not performed in this time. Therefore, if domain specific QA is the case, we would have to firstly pre-train REALM to get domain specific evidence blocks (retriever), then we can further fine-tine on a given dataset.

shamanez commented 2 years ago

So we have to pre train the REALM with masked word prediction task right?

On Sat, 29 Jan 2022 at 5:42 PM, Li-Huai (Allan) Lin < @.***> wrote:

As you saw in the paper, the evidence blocks are frozen during fine-tuning, which means that index updates are not performed in this time. Therefore, if domain specific QA is the case, we would have to firstly pre-train REALM to get domain specific evidence blocks, then we can further fine-tine on a given dataset.

— Reply to this email directly, view it on GitHub https://github.com/qqaatw/pytorch-realm-orqa/issues/4#issuecomment-1024832831, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEA4FGQFWTPDML34RREGCQTUYNVZXANCNFSM5NCKNTHQ . You are receiving this because you authored the thread.Message ID: @.***>

-- [image: Augmented Human Lab] http://www.ahlab.org/ [image: uni] https://www.auckland.ac.nz/en/abi.html

Gayal Shamane Ph.D. Candidate Augmented Human Lab Auckland Bioengineering Institute | The University of Auckland

qqaatw commented 2 years ago

Exactly, but the pre-training part has not been fully ported to PyTorch, especially asynchronous MIPS refreshes, and Inverse Cloze Task (ICT), which is used to warm-start retriever training. Thus, to pre-train REALM, we would have to utilize the original TF impl., and then can fine-tune it on PyTorch.

shamanez commented 2 years ago

Thanks a lot for your insight. Anyways this end-to-end fine-tuning will be very expensive.

robbohua commented 2 years ago

@qqaatw is it part of the roadmap to port the pre-training part to Pytorch?

qqaatw commented 2 years ago

@qqaatw is it part of the roadmap to port the pre-training part to Pytorch?

It was part of the roadmap, but now I'm thinking whether this is worth to port.

You can see the configuration of their experiments:

Pre-training We pre-train for 200k steps on 64 Google Cloud TPUs, with a batch size of 512 and a learning rate of 3e-5, using BERT’s default optimizer. The document embedding step for the MIPS index is parallelized over 16 TPUs. For each example, we retrieve and marginalize over 8 candidate documents, including the null document ∅

which leveraged an array of resources and is extremely expensive for normal users and researchers. I don't have such resources and a regular deep learning workstation will not be able to reproduce similar results like that of them I think.

shamanez commented 2 years ago

@qqaatw "It was part of the roadmap, but now I'm thinking whether this is worth port." Yeah, this seems a problem and I agree.