mit-han-lab / lite-transformer

[ICLR 2020] Lite Transformer with Long-Short Range Attention
https://arxiv.org/abs/2004.11886
Other
596 stars 81 forks source link

about kernel size #37

Closed sanwei111 closed 2 years ago

sanwei111 commented 3 years ago

parser.add_argument('--decoder-kernel-size-list', nargs='', default=[3, 7, 15, 31, 31, 31, 31], type=int) parser.add_argument('--encoder-kernel-size-list', nargs='', default=[3, 7, 15, 31, 31, 31, 31], type=int)

as you c,above code is about the param of kernel size around the 6 encoder or decoder layer,i just wonder that why they don‘t keep same for 6 layers???

Michaelvll commented 2 years ago

Thank you for asking! We follow the setup from the Pay Less Attention with Lightweight and Dynamic Convolutions.