openai / gpt-2

Code for the paper "Language Models are Unsupervised Multitask Learners"
https://openai.com/blog/better-language-models/
Other
22.57k stars 5.53k forks source link

src/model.py gelu uses numpy functions #228

Open EsbernTK opened 4 years ago

EsbernTK commented 4 years ago

The gelu function in the src/model.py script uses numpy.sqrt and numpy.pi, how does this affect GPU performance, and does it even work with GPU? If not, it should be changed to similar functions in tf.

mikolasan commented 4 years ago

You mean this part: np.sqrt(2/np.pi)? I think it can be replaced with a constant value calculated once (0.7978845608028654). Just decide what precision is going to be enough.