mit-han-lab / lite-transformer

[ICLR 2020] Lite Transformer with Long-Short Range Attention
https://arxiv.org/abs/2004.11886
Other
596 stars 81 forks source link

TransformerEncoderLayer #36

Closed sanwei111 closed 2 months ago

sanwei111 commented 3 years ago

hell,in the file of transformer-multibranch-v2,the class of TransformerEncoderLayer--the code are as follow: if args.encoder_branch_type is None:#default=None???? self.self_attn = MultiheadAttention( self.embed_dim, args.encoder_attention_heads, dropout=args.attention_dropout, self_attention=True, ) else: layers = [] embed_dims = [] heads = [] num_types = len(args.

I just wonder that do the args.encoder_branch_type equalstrue???

realzza commented 3 years ago

hell,in the file of transformer-multibranch-v2,the class of TransformerEncoderLayer--the code are as follow: if args.encoder_branch_type is None:#default=None???? self.self_attn = MultiheadAttention( self.embed_dim, args.encoder_attention_heads, dropout=args.attention_dropout, self_attention=True, ) else: layers = [] embed_dims = [] heads = [] num_types = len(args.

I just wonder that do the args.encoder_branch_type equalstrue???

Hi, args.encoder_branch_type is a list containing the encoder branch type defined in your training yml file. In my case, I set the encoder_branch_type in the training yml as encoder-branch-type: [attn:1:32:4, dynamic:default:32:4], where 32 represents the embedding dimension, and 4 stands for the attention head numbers. Hope this helps!

sanwei111 commented 3 years ago

hell,in the file of transformer-multibranch-v2,the class of TransformerEncoderLayer--the code are as follow: if args.encoder_branch_type is None:#default=None???? self.self_attn = MultiheadAttention( self.embed_dim, args.encoder_attention_heads, dropout=args.attention_dropout, self_attention=True, ) else: layers = [] embed_dims = [] heads = [] num_types = len(args. I just wonder that do the args.encoder_branch_type equalstrue???

Hi, args.encoder_branch_type is a list containing the encoder branch type defined in your training yml file. In my case, I set the encoder_branch_type in the training yml as encoder-branch-type: [attn:1:32:4, dynamic:default:32:4], where 32 represents the embedding dimension, and 4 stands for the attention head numbers. Hope this helps!

thx,what'S the meaning of [attn:1:32:4, dynamic:default:32:4]?could you show some details about the list

realzza commented 3 years ago

thx,what'S the meaning of [attn:1:32:4, dynamic:default:32:4]?could you show some details about the list

As I mentioned in my last reply, args.encoder_branch_type should not be a boolean value, instead it should be a list recording the branch type of your encoder. As for 32 and 4, they represent params embed_dim and num_head when initializing MultiheadAttention and DynamicconvLayer modules. https://github.com/mit-han-lab/lite-transformer/blob/de9631cbbbb9c42dce3616a1e95fb59a89ab696e/configs/cnndm/attention/multibranch_v2/embed496.yml#L36 You can find more details on these two params at the get_layer method in TransformerEncoderLayer module. https://github.com/mit-han-lab/lite-transformer/blob/de9631cbbbb9c42dce3616a1e95fb59a89ab696e/fairseq/models/transformer_multibranch_v2.py#L617-L645 Find more details about MultiheadAttention module at https://github.com/mit-han-lab/lite-transformer/blob/de9631cbbbb9c42dce3616a1e95fb59a89ab696e/fairseq/modules/multihead_attention.py#L15-L27

sanwei111 commented 3 years ago

thx,what'S the meaning of [attn:1:32:4, dynamic:default:32:4]?could you show some details about the list

As I mentioned in my last reply, args.encoder_branch_type should not be a boolean value, instead it should be a list recording the branch type of your encoder. As for 32 and 4, they represent params embed_dim and num_head when initializing MultiheadAttention and DynamicconvLayer modules. https://github.com/mit-han-lab/lite-transformer/blob/de9631cbbbb9c42dce3616a1e95fb59a89ab696e/configs/cnndm/attention/multibranch_v2/embed496.yml#L36

You can find more details on these two params at the get_layer method in TransformerEncoderLayer module. https://github.com/mit-han-lab/lite-transformer/blob/de9631cbbbb9c42dce3616a1e95fb59a89ab696e/fairseq/models/transformer_multibranch_v2.py#L617-L645

Find more details about MultiheadAttention module at https://github.com/mit-han-lab/lite-transformer/blob/de9631cbbbb9c42dce3616a1e95fb59a89ab696e/fairseq/modules/multihead_attention.py#L15-L27

thx a lot!!! one more,as shown below,

for layer_type in args.encoder_branch_type: embed_dims.append(int(layer_type.split(':')[2])) heads.append(int(layer_type.split(':')[3])) layers.append(self.get_layer(args, index, embed_dims[-1], heads[-1], layer_type)) self.self_attn = MultiBranch(layers, embed_dims)

the above code appear in the encoderlayer class,as you said,args.encoder_branch_type ==[attn:1:160:4, lightweight:default:160:4],but it lead to some errors,how to comprehend it????

zhijian-liu commented 2 months ago

Thank you for your interest in our project. Unfortunately, this repository is no longer actively maintained, so we will be closing this issue. If you have any further questions, please feel free to email us. Thank you again!