openai / finetune-transformer-lm

Code and model for the paper "Improving Language Understanding by Generative Pre-Training"
https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf
MIT License
2.15k stars 503 forks source link

The Conv1d over Linear? #49

Open maciejbalawejder opened 2 years ago

maciejbalawejder commented 2 years ago

https://github.com/openai/finetune-transformer-lm/blob/a69b5c43b0452462890bca8ff92fb75dee9290cf/train.py#L106

I see you use through the whole code 1-d convolutions, which technically should perform the same as Dense layers. The main difference I found on the internet was a shorter computing time for the Dense layer. Hence why I am wondering why you use 1D convolutions here?

Cheers