Closed miyamonz closed 3 years ago
Hi @miyamonz ,
According to my personal understanding about it. That is because there are two cases when finetuning. For exmaple,
CoLA task: (check grammatical correctness, one sentence per sample)
[CLS] This is an example . [SEP]
QQP task: (check whether two questions replicates, two sentences per sample)
[CLS] What are must eat cuisines around Nagoya University ? [SEP] What are recommended cuisines around Nagoya University ? [SEP]
So to let the pretrained model get used to both patterns and minimize the gap between pretraining and finetuning, in ELECTRA we randomly create single/two-segment examples.
Please tag me if you have any question.
thanks! it's so helpful for me.
First, thanks to share this repo! it's very helpful for me to understand pretraining ELECTRA.
I got a question about ELECTRADataProcessor. https://github.com/richarddwang/electra_pytorch/blob/80d1790b6675720832c5db5f22b7e036f68208b8/_utils/utils.py#L101
I read this code and found it corresponds to this file. https://github.com/google-research/electra/blob/master/build_pretraining_dataset.py#L34
I can understand what the part does. It's a preprocessing thing. randomly split sentences to two segments, and merge it as an example and so on. But, I can't understand why it does. I read the ELECTRA paper roughly, but I can't found it. On my understanding, ELECTRA just needs many sentences like BERT. Why 2 segments are needed, and why it is randomly split in preprocess time?
I already ask it here, but there is no response. https://github.com/google-research/electra/issues/114
I would be happy if you could reply to me when you know something and have a time.