Need to restrict the value of alpha to be positive for elu operation

webmachinelearning / webnn

🧠 Web Neural Network API

https://www.w3.org/TR/webnn/

Other

368 stars 45 forks source link

Need to restrict the value of alpha to be positive for elu operation #383

Open lisa0314 opened 1 year ago

lisa0314 commented 1 year ago

According to the elu definition of wikipedia and the paper, the alpha should be positive.

This issue was raised by @huningxin in WebNN Chromium CL review. Thanks Ningxin!

huningxin commented 1 year ago

Thanks for opening this issue, @lisa0314 !

The quick survey of the native ML APIs support:

XNNPACK xnn_define_elu() requires the alpha value to be positive.
DirectML DML_ACTIVATION_ELU_OPERATOR_DESC doc doesn't mention this restriction.
MPSGraph MPSCNNNeuronELUNode doc doesn't restrict the alpha value either.

/cc @wacky6 @fdwr

fdwr commented 1 year ago

Hmm, ML API's typically don't restrict floating point inputs (more often they reject invalid integer quantities like bad axes/sizes), and for the sake of broader compat, we probably shouldn't unnecessarily reject values that work in other libraries. e.g.:

import torch

x = torch.tensor([-3, 3], dtype = torch.float32)
s = torch.nn.ELU(alpha = -1) # ✅ works, and even float('nan') is allowed.
y = s(x)

print("value:", y)
print("shape:", y.shape)
print("dtype:", y.dtype)

# value: tensor([0.9502, 3.0000])
# shape: torch.Size([2])
# dtype: torch.float32

The plot has a kink, but otherwise looks non-degenerate:

UPDATE: Fixed graph after Ningxin's comment

If we rejected negative values, and someone was somehow using it for compat reasons, what would be the decomposition? I suppose you could fall back to elementwiseIf(greater(x, 0), x, scale * (exp(x) - 1)), now that elementwiseIf and greater are pending operators.

huningxin commented 1 year ago

@fdwr

When alpha is -1, the calculation becomes -1 * ( exp(x) - 1), and the plot would look like

if someone did need a negative ~alpha~ scale coefficient for negative inputs for whatever unusual compat reason, what would be the decomposition?

I think the decomposition sample in current spec still works for negative alpha, e.g. -1:

return builder.add(
          builder.max(builder.constant(0), x),
          builder.mul(
            builder.constant(-1), // alpha = -1
            builder.sub(
              builder.exp(builder.min(builder.constant(0), x)),
              builder.constant(1))));

(BTW, there is a typo in the elu sample code, I'll fix it.)

fdwr commented 1 year ago

@huningxin: Doh, fixed graph. 👍 Yep, you're right, because the scale multiply occurs after the min, the existing decomposition works fine.

I think elu should just return a result equivalent to its decomposition elementwiseIf(greater(x, 0), x, scale * (exp(x) - 1)), whether the alpha scale is positive or negative. Similarly for NaN's, I'd just follow standard IEEE behavior and propagate it through. We don't have special checks for NaN's with the other floating-point activation/elementwise operators, and consider that if scale was actually a tensor instead of a single scalar, we wouldn't bother to scan every value inside it.