Open netagl opened 4 months ago
I think it might se related to this:
encoder_attention_mask = _prepare_4d_attention_mask(
encoder_attention_mask, inputs_embeds.dtype, tgt_len=input_shape[-1]
)
In the comment written:
but _prepare_4d_attention_mask returns src_seq_len as 1. Is this is what you ment? because it is not working well with the cross attention condition:
if attention_mask is not None:
if attention_mask.size() != (bsz, 1, tgt_len, src_len):
raise ValueError(
f"Attention mask should be of size {(bsz, 1, tgt_len, src_len)}, but is {attention_mask.size()}"
)
i would like some help @ylacombe
Hey @netagl, thanks for your message! not sure to understand your issue, could you send a code snippet to reproduce any potential issues ?
The attention mask is needed in the cross attention layer if you have a batch of samples, otherwise you don't need to pass it to the model!
@netagl, Is your audio_encoder_per_device_batch_size
1?
hi, I have attention_mask problem mismatch in the cross attenstion
can you please explain this line: requires_attention_mask = "encoder_outputs" not in model_kwargs ?
why is comed after this: if "encoder_outputs" not in model_kwargs:
encoder_outputs are created and added to
model_kwargs
is the attention mask is needed for the cross attnetion layer in the generation part? this mismach problem accure only in the generator the train & eval are ok.
tnx!