synnada-ai / mithril

Mithril: A Modular Machine Learning Library for Model Composability
Apache License 2.0
31 stars 8 forks source link

[FEATURE] GeLU tanh approximation #35

Open aturker-synnada opened 1 week ago

aturker-synnada commented 1 week ago

Describe the Feature

GeLU tanh approximation(which has support in Torch, MLX, and Jax) which eliminates use of computationally expensive erf.

Motivation

Current GeLU implementation relies on erf, which is costly. Deep learning models use tanh-based approximation for efficiency.

Proposed Solution

Add an approximate: bool = False parameter to toggle between the exact and approximate implementations. Torch and Jax accept this as a function argument, while MLX requires modifying the called function.