Closed cingtiye closed 2 years ago
I didn't quite understand what you mean. Here I try to address it from my speculation: the self.decoder2
itself is also a full BERT model and it works like a normal BERT, which takes the input_ids
, as well as the per_input_ids
in this project, as input, if without the encoder_hidden_states
. Actually, when encoder_hidden_states
is None, the huggingface library will treat the self.decoder2
as an encoder rather than a decoder. Anyway, it wouldn't bother the ul_training whether the model is an encoder or a decoder.
Thanks. For the following code line 382
class BertLayer(nn.Module):
def __init__(self, config):
"""..."""
def forward(
self,
hidden_states,
attention_mask=None,
head_mask=None,
encoder_hidden_states=None,
encoder_attention_mask=None,
output_attentions=False,
per_hidden_states=None,
):
if self.is_decoder2 and encoder_hidden_states is not None:
per_attention_outputs = self.attention(
per_hidden_states,
None,
head_mask,
output_attentions=output_attentions,
)
per_attention_output = per_attention_outputs[0]
self_attention_outputs = self.attention(
hidden_states,
attention_mask,
head_mask,
output_attentions=output_attentions,
)
attention_output = self_attention_outputs[0]
outputs = self_attention_outputs[1:] # add self attentions if we output attention weights
if self.is_decoder and encoder_hidden_states is not None and not self.is_decoder2:
assert hasattr(
self, "crossattention"
), f"If `encoder_hidden_states` are passed, {self} has to be instantiated with cross-attention layers by setting `config.add_cross_attention=True`"
cross_attention_outputs = self.crossattention(
attention_output,
attention_mask,
head_mask,
encoder_hidden_states,
encoder_attention_mask,
output_attentions,
)
attention_output = cross_attention_outputs[0]
outputs = outputs + cross_attention_outputs[1:]
elif self.is_decoder2 and encoder_hidden_states is not None:
assert hasattr(
self, "crossattention"
), f"If `encoder_hidden_states` are passed, {self} has to be instantiated with cross-attention layers by setting `config.add_cross_attention=True`"
query_hidden_states = self.crossattention(
attention_output,
None,
head_mask,
per_attention_output,
None,
output_attentions,
)[0]
cross_attention_outputs = self.crossattention(
query_hidden_states,
attention_mask,
head_mask,
encoder_hidden_states,
encoder_attention_mask,
output_attentions,
)
attention_output = cross_attention_outputs[0]
outputs = outputs + cross_attention_outputs[1:] # add cross attentions if we output attention weights
layer_output = apply_chunking_to_forward(
self.feed_forward_chunk, self.chunk_size_feed_forward, self.seq_len_dim, attention_output
)
outputs = (layer_output,) + outputs
return outputs
def feed_forward_chunk(self, attention_output):
intermediate_output = self.intermediate(attention_output)
layer_output = self.output(intermediate_output, attention_output)
return layer_output
When ul_training=True
and encoder_hidden_states=None
, only line 417 will be executed for the code in the link above. line 417 about self.attention
excludes the parameter per_hidden_states
(from per_input_ids=persona_input_ids=inference_dict['neg_pre_input_ids']
). Other attention function with per_hidden_states
can not be executed. Thus, I can't understand how hyb is generated?
I see your point. In ul_training
the hyp
is not generated but discouraged by the unlikely objectives, which is implemented by reversing the cross-entropy loss.
For the following code:
because
encoder_hidden_states=None
, onlyself.attention
can be executed, other related attention code seems can't be executed. Thus the above codeper_input_ids=persona_input_ids
seems not be used.So, hyp is generated by what? Please.