ne7ermore / torch-light

Deep-learning by using Pytorch. Basic nns like Logistic, CNN, RNN, LSTM and some examples are implemented by complex model.
MIT License
535 stars 201 forks source link

Confused by this conv1d operation #9

Open airkid opened 5 years ago

airkid commented 5 years ago

Hi, I'm reading this code for study and it helps me a lot. I'm confused by this line: https://github.com/ne7ermore/torch-light/blob/254c1333eef5ee35a1b5e036f267b81ddad17f96/BERT/model.py#L74

from the source paper of BERT, I've not found any description that BERT use a conv1d layer in transformer instead of linear transformation.

And from http://nlp.seas.harvard.edu/2018/04/03/attention.html#position-wise-feed-forward-networks, this is implement by a mlp.

Can anyone kindly help me with this problem?

ne7ermore commented 5 years ago

It is the same

airkid notifications@github.com 于2019年5月18日周六 下午3:20写道:

Hi, I'm reading this code for study and it helps me a lot. I'm confused by this line:

https://github.com/ne7ermore/torch-light/blob/254c1333eef5ee35a1b5e036f267b81ddad17f96/BERT/model.py#L74

from the source paper of BERT, I've not found any description that BERT use a conv1d layer in transformer instead of linear transformation.

And from http://nlp.seas.harvard.edu/2018/04/03/attention.html#position-wise-feed-forward-networks, this is implement by a mlp.

Can anyone kindly help me with this problem?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ne7ermore/torch-light/issues/9?email_source=notifications&email_token=AF6W56X5T5P2MDZLUOSUIFTPV6U3HA5CNFSM4HNZXFYKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4GUQVMWA, or mute the thread https://github.com/notifications/unsubscribe-auth/AF6W56RMABSJNIUKBJQ5WJDPV6U3HANCNFSM4HNZXFYA .