I have been using an implementation of GPT-2 from your repository and noticed that the size of the smallest GPT-2 model available in the repository differs from the smallest model mentioned in the original paper of GPT-2.
Specifically, the size of the parameters of the smallest model in the repository is about 124M but the smallest model in original paper is 117M
I have been using an implementation of GPT-2 from your repository and noticed that the size of the smallest GPT-2 model available in the repository differs from the smallest model mentioned in the original paper of GPT-2. Specifically, the size of the parameters of the smallest model in the repository is about 124M but the smallest model in original paper is 117M
I am curious to know why there is this difference