webmachinelearning / webnn

🧠 Web Neural Network API
https://www.w3.org/TR/webnn/
Other
364 stars 45 forks source link

Need Gelu operation #626

Closed mingmingtasd closed 4 months ago

mingmingtasd commented 4 months ago

Use Case: Gelu performs the gaussian error linear unit (GELU) activation function: f(x) = 0.5 * x * (1.0 + erf(x / sqrt(2))). Some transformer models such as the whisper base contain decomposed small ops like Add, Div, Erf, Mul following the MatMul and Conv operations. These small ops can be composed as a higher level operation Gelu. Furthermore, the Gelu can also be fused into the operations like MatMul and Conv , it has been confirmed that the fusion can significantly improve the performance of a graph's execution at least on DirectML backend.

Sample models: whisper base

Cross-framework support: ONNX, TensorFlow, PyTorch

Cross-platform implementability: CoreML, DirectML, OpenVINO,

So I propose this new operation to WebNN Spec, my PR is WIP. @huningxin @fdwr

zolkis commented 4 months ago

Paper: https://arxiv.org/abs/1606.08415

huningxin commented 4 months ago

Sample transformer models using Gelu also include Stable Diffusion U-Net and Segment Everything decoder.

fdwr commented 4 months ago

This complements the existing elu, relu, prelu:

We'll need the decomposition for the spec...

return builder.mul(
    builder.mul(x, builder.constant(0.5)),
    builder.add(
        builder.constant(1.0),
        builder.erf(
            builder.div(
                x,
                builder.sqrt(builder.constant(2))
            )
        )
    )

...and a tolerance, at least 18 ULP depending on the erf implementation.

mingmingtasd commented 4 months ago

This complements the existing elu, relu, prelu ...and a tolerance, at least 18 ULP depending on the erf implementation.

Yes! And also cc/ @BruceDai for the tolerance information.