GeLU tanh approximation(which has support in Torch, MLX, and Jax) which eliminates use of computationally expensive erf.
Motivation
Current GeLU implementation relies on erf, which is costly. Deep learning models use tanh-based approximation for efficiency.
Proposed Solution
Add an approximate: bool = False parameter to toggle between the exact and approximate implementations. Torch and Jax accept this as a function argument, while MLX requires modifying the called function.
Describe the Feature
GeLU tanh approximation(which has support in Torch, MLX, and Jax) which eliminates use of computationally expensive erf.
Motivation
Current GeLU implementation relies on erf, which is costly. Deep learning models use tanh-based approximation for efficiency.
Proposed Solution
Add an approximate: bool = False parameter to toggle between the exact and approximate implementations. Torch and Jax accept this as a function argument, while MLX requires modifying the called function.