Closed toriving closed 4 years ago
To me, this sound more like a case where encoder-decoder models like T5
or Bart
should be fine-tuned. The encoder would encode the "context" and the decoder would be teacher-forced on the sentence.
To me, this sound more like a case where encoder-decoder models like
T5
orBart
should be fine-tuned. The encoder would encode the "context" and the decoder would be teacher-forced on the sentence.
Thx very much :)
Perhaps, Is there such logic applied to training code now?
@toriving I've successfully done "conditional" fine-tuning by adding a new token that indicates which portion of the sequence refers to the "context", similar to the [SEP] token used in the multi sequence version of BERT.
E.g. Here's an example of how I apply this to prepare a dataset for training GPT2 to generate answers to riddle jokes:
<soq> Why did the chicken cross the road? <eoq> To go to the other side <|endoftext|>
The effect is the answer (after <eoq>
), is conditional on the question that precedes it.
@enzoampil When learning with such data, is "condition" also used in the loss function? I mean, I am wondering if "Condition" is also learning with a language model.
Yes if you specify it like above it should
Okay. Thanks
To me, this sound more like a case where encoder-decoder models like
T5
orBart
should be fine-tuned. The encoder would encode the "context" and the decoder would be teacher-forced on the sentence.
I would like to ask if you think that using the encoder-decoder model (with wrapping the gpt2 model as encoder and decoder too) will provide normal results, or wrapping the gpt2 model as encoder is not a good idea(maybe use bert as encoder?)?
currently only bert2bert is supported with the EncoderDecoder structure.
@toriving I've successfully done "conditional" fine-tuning by adding a new token that indicates which portion of the sequence refers to the "context", similar to the [SEP] token used in the multi sequence version of BERT.
E.g. Here's an example of how I apply this to prepare a dataset for training GPT2 to generate answers to riddle jokes:
<soq> Why did the chicken cross the road? <eoq> To go to the other side <|endoftext|>
The effect is the answer (after
<eoq>
), is conditional on the question that precedes it.
i would like to ask if you masked inputs part on labels on forward function. What I mean is that you maybe pass labels=input_ids to the forward function. So you set only the padding tokens as masked (value -100) or you set as masked the input tokens too? As we try to perform conditional generation, I think we should count on loss only the reply(?).
I can use run_generation.py to create a statement by adding context.
But is there a way to do fine-tuning based on condition (context)? For example, when data of "context [SEP] sentence" is input, the "context" is used to obtain the hidden state without learning. In addition, the "sentence" is learned with the language model.