Open airkid opened 5 years ago
It is the same
airkid notifications@github.com 于2019年5月18日周六 下午3:20写道:
Hi, I'm reading this code for study and it helps me a lot. I'm confused by this line:
from the source paper of BERT, I've not found any description that BERT use a conv1d layer in transformer instead of linear transformation.
And from http://nlp.seas.harvard.edu/2018/04/03/attention.html#position-wise-feed-forward-networks, this is implement by a mlp.
Can anyone kindly help me with this problem?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ne7ermore/torch-light/issues/9?email_source=notifications&email_token=AF6W56X5T5P2MDZLUOSUIFTPV6U3HA5CNFSM4HNZXFYKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4GUQVMWA, or mute the thread https://github.com/notifications/unsubscribe-auth/AF6W56RMABSJNIUKBJQ5WJDPV6U3HANCNFSM4HNZXFYA .
Hi, I'm reading this code for study and it helps me a lot. I'm confused by this line: https://github.com/ne7ermore/torch-light/blob/254c1333eef5ee35a1b5e036f267b81ddad17f96/BERT/model.py#L74
from the source paper of BERT, I've not found any description that BERT use a conv1d layer in transformer instead of linear transformation.
And from
http://nlp.seas.harvard.edu/2018/04/03/attention.html#position-wise-feed-forward-networks
, this is implement by a mlp.Can anyone kindly help me with this problem?