The gelu function in the src/model.py script uses numpy.sqrt and numpy.pi, how does this affect GPU performance, and does it even work with GPU? If not, it should be changed to similar functions in tf.
You mean this part: np.sqrt(2/np.pi)? I think it can be replaced with a constant value calculated once (0.7978845608028654). Just decide what precision is going to be enough.
The gelu function in the src/model.py script uses numpy.sqrt and numpy.pi, how does this affect GPU performance, and does it even work with GPU? If not, it should be changed to similar functions in tf.