openai / finetune-transformer-lm

Code and model for the paper "Improving Language Understanding by Generative Pre-Training"
https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf
MIT License
2.15k stars 503 forks source link

Using conv1d with kernel size 1 #19

Closed ollmer closed 6 years ago

ollmer commented 6 years ago

Hi! I've noticed that the training code using 1d convolution with kernel size 1 in all invocations. Do we need convolution at all here? Why not replace it with the fully_connected layer?

chaitjo commented 6 years ago

If my understanding is correct, using a 1D convolution is the same as taking a dot product between the original matrix (say of dimensions n_timesteps x d) and a matrix of dimension d x n_filters.

Newmu commented 6 years ago

The codebase uses matmul when the receptive field size is 1. I originally thought conv1d would do this automatically "under the hood" but that does not appear to be the case.