The M obtained by the function _hide_other_subjects_from_subjects is all 0, which means that the result of self-attn is still the result of Q*K, not the result of bounded attention.
The results of subject_masks and background_masks obtained in function _hide_other_subjects_from_subjects are as follows:
That is right
However, after the following code operation, the returned sim_masks are all 0.
The M obtained by the function _hide_other_subjects_from_subjects is all 0, which means that the result of self-attn is still the result of Q*K, not the result of bounded attention.
The results of subject_masks and background_masks obtained in function _hide_other_subjects_from_subjects are as follows: That is right
However, after the following code operation, the returned sim_masks are all 0.