Open XiaoyuShi97 opened 3 years ago
See Annex C
In the cross-attention module, inputs are first processed with layer norm (Ba et al., 2016) before being passed through linear layers to produce each of the query, key, and value inputs to the QKV cross-attention operation.
Hi, thx for sharing the code! I wonder what norm_context refer to in the paper? https://github.com/lucidrains/perceiver-pytorch/blob/3b70ebee00c66f15b38c5980f4275f744a433895/perceiver_pytorch/perceiver_io.py#L125