The CPU implementation of LayerNormalization occasionally produces nan when computing the square root of variance. This occurs when the input variance is extremely small, potentially leading to negative sqrt inputs due to numerical precision errors.
To reproduce
import io
import onnxruntime as ort
import torch
from torch import nn
x = torch.full((256,), 1234.0)
class Model(nn.Module):
def __init__(self):
super().__init__()
self.layer_norm = nn.LayerNorm(256)
def forward(self, x):
return self.layer_norm(x)
model = Model()
onnx_program = torch.onnx.dynamo_export(model, x)
b = io.BytesIO()
onnx_program.save(b)
sess = ort.InferenceSession(b.getvalue())
print(sess.run(None, {sess.get_inputs()[0].name: x.numpy()})[0]) # nan
This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.
Describe the issue
The CPU implementation of LayerNormalization occasionally produces
nan
when computing the square root of variance. This occurs when the input variance is extremely small, potentially leading to negative sqrt inputs due to numerical precision errors.To reproduce
Urgency
No response
Platform
Windows
OS Version
Win11
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.19.0
ONNX Runtime API
Python
Architecture
X64
Execution Provider
Default CPU
Execution Provider Library Version
No response