The details about "blk(x, return_attention=True)"

Thank you for your excellent work and open source the code! In the pre-training process, every 40 epochs need to update the mask weight, "attn_map" is obtained by model.forward_encoder_test, and in the Transformer blocks of forward_encoder_test, finally "return blk(x, return_attention=True)" , but the original ViT block does not have a "return_attention" parameter, what are the specific implementation details of this part? Looking forward to and thank you for your answer!

guijiejie / AMT

The details about "blk(x, return_attention=True)" #2