Closed yilonghe closed 2 years ago
In general, the mask variable in our code are used to mask the invalid part in our input.
For example, we have a fixed length of 2304 input feature sequences to the network, some feature sequences are shorter than 2304, we will pad them to fixed lengths with 2304, and using the mask to show that some parts are valid, some parts are just the padding part, the mask variable will be passed through the whole network to get the correct mask for each layer.
Specially, out_mask represents the out mask for the output of each layer, and the qx_mask represents the mask for Q in self_attention.
@yilonghe Is this issue addressed by Chenlin's reply?
Closed due to inactivity.
Hi! thanks for your great work. I wonder what the out_mask and qx_mask are used for?