Confused by this conv1d operation

It is the same

airkid notifications@github.com 于2019年5月18日周六下午3:20写道：

Hi, I'm reading this code for study and it helps me a lot. I'm confused by this line:

https://github.com/ne7ermore/torch-light/blob/254c1333eef5ee35a1b5e036f267b81ddad17f96/BERT/model.py#L74

from the source paper of BERT, I've not found any description that BERT use a conv1d layer in transformer instead of linear transformation.

And from http://nlp.seas.harvard.edu/2018/04/03/attention.html#position-wise-feed-forward-networks, this is implement by a mlp.

Can anyone kindly help me with this problem?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ne7ermore/torch-light/issues/9?email_source=notifications&email_token=AF6W56X5T5P2MDZLUOSUIFTPV6U3HA5CNFSM4HNZXFYKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4GUQVMWA, or mute the thread https://github.com/notifications/unsubscribe-auth/AF6W56RMABSJNIUKBJQ5WJDPV6U3HANCNFSM4HNZXFYA .

ne7ermore / torch-light

Confused by this conv1d operation #9