guijiejie / AMT

22 stars 0 forks source link

The details about "blk(x, return_attention=True)" #2

Closed andytianph closed 1 year ago

andytianph commented 1 year ago

Thank you for your excellent work and open source the code! In the pre-training process, every 40 epochs need to update the mask weight, "attn_map" is obtained by model.forward_encoder_test, and in the Transformer blocks of forward_encoder_test, finally "return blk(x, return_attention=True)" , but the original ViT block does not have a "return_attention" parameter, what are the specific implementation details of this part? Looking forward to and thank you for your answer! 1 2