Thank you for your excellent work and open source the code!
In the pre-training process, every 40 epochs need to update the mask weight, "attn_map" is obtained by model.forward_encoder_test,
and in the Transformer blocks of forward_encoder_test, finally "return blk(x, return_attention=True)" , but the original ViT block does not have a "return_attention" parameter, what are the specific implementation details of this part?
Looking forward to and thank you for your answer!
Thank you for your excellent work and open source the code! In the pre-training process, every 40 epochs need to update the mask weight, "attn_map" is obtained by model.forward_encoder_test, and in the Transformer blocks of forward_encoder_test, finally "return blk(x, return_attention=True)" , but the original ViT block does not have a "return_attention" parameter, what are the specific implementation details of this part? Looking forward to and thank you for your answer!