Open lisa0314 opened 1 year ago
Thanks for opening this issue, @lisa0314 !
The quick survey of the native ML APIs support:
xnn_define_elu()
requires the alpha
value to be positive. DML_ACTIVATION_ELU_OPERATOR_DESC
doc doesn't mention this restriction. MPSCNNNeuronELUNode
doc doesn't restrict the alpha value either. /cc @wacky6 @fdwr
Hmm, ML API's typically don't restrict floating point inputs (more often they reject invalid integer quantities like bad axes/sizes), and for the sake of broader compat, we probably shouldn't unnecessarily reject values that work in other libraries. e.g.:
import torch
x = torch.tensor([-3, 3], dtype = torch.float32)
s = torch.nn.ELU(alpha = -1) # ✅ works, and even float('nan') is allowed.
y = s(x)
print("value:", y)
print("shape:", y.shape)
print("dtype:", y.dtype)
# value: tensor([0.9502, 3.0000])
# shape: torch.Size([2])
# dtype: torch.float32
The plot has a kink, but otherwise looks non-degenerate:
UPDATE: Fixed graph after Ningxin's comment
If we rejected negative values, and someone was somehow using it for compat reasons, what would be the decomposition? I suppose you could fall back to elementwiseIf(greater(x, 0), x, scale * (exp(x) - 1))
, now that elementwiseIf
and greater
are pending operators.
@fdwr
When alpha
is -1, the calculation becomes -1 * ( exp(x) - 1)
, and the plot would look like
if someone did need a negative ~alpha~ scale coefficient for negative inputs for whatever unusual compat reason, what would be the decomposition?
I think the decomposition sample in current spec still works for negative alpha
, e.g. -1:
return builder.add(
builder.max(builder.constant(0), x),
builder.mul(
builder.constant(-1), // alpha = -1
builder.sub(
builder.exp(builder.min(builder.constant(0), x)),
builder.constant(1))));
(BTW, there is a typo in the elu sample code, I'll fix it.)
@huningxin: Doh, fixed graph. 👍 Yep, you're right, because the scale multiply occurs after the min, the existing decomposition works fine.
I think elu
should just return a result equivalent to its decomposition elementwiseIf(greater(x, 0), x, scale * (exp(x) - 1))
, whether the alpha scale is positive or negative. Similarly for NaN's, I'd just follow standard IEEE behavior and propagate it through. We don't have special checks for NaN's with the other floating-point activation/elementwise operators, and consider that if scale
was actually a tensor instead of a single scalar, we wouldn't bother to scan every value inside it.
According to the elu definition of wikipedia and the paper, the alpha should be positive.
This issue was raised by @huningxin in WebNN Chromium CL review. Thanks Ningxin!