Support for LocalResponseNormalization (LRN) operation

Here is a brief result of the improvements:	Model	OpenCV.js wasm	OpenCV.js wasm+simd+threads	OpenCV native default	OpenCV OpenVINO	OpenCV WebNN	OpenCV.js WebNN-polyfill	OpenCV.js WebNN-Electron
GoogleNet	825.07ms	51.55ms	29.32ms	10.35ms	24.8ms	69.15ms	24.90ms
SqueezeNet	462.12ms	31.69ms	17.4ms	4.29ms	4.56ms	21.27ms	4.07ms

⏳ I'll amend this comment with further justification and more comprehensive API comparisons later...

I prototyped localResponseNormalization here, which worked for my limited use. Though, between the various implementations (TensorFlow, CoreML, DirectML, Caffe, PyTorch), there are enough little differences (kernel symmetry, which axes are used, which dimensions are windowed...) that the final WebNN form probably ought to be a more generic form taking axes rather than just axis, and windowSize rather than a radius; and it warrants a table with a clear mapping to each backend.

Known Models

AlexNet
Inception v1.12

Behavior

Local response normalization produces an output the same size as the input, using a sliding window where each output element equals the corresponding input element divided by an adjusted averaged window around it. The shape of that sliding window can vary in size and rank, along a single axis or more. Although not obvious at first, the operator is really a variation of pooling, with the general form:

function localResponseNormalization(input, axes, windowLength, scale, bias, exponent)
{
    let leadingPadding = floor((windowLength - 1) / 2); // Center halfway around sliding window
    let trailingPadding = ceil((windowLength - 1) / 2); // Center halfway around sliding window
    let padding = new Array(axes.size() * 2).fill([leadingPadding, trailingPadding]).flat();
        // 1D padding = [leadingPadding, trailingPadding]
        // 2D padding = [leadingPadding, trailingPadding, leadingPadding, trailingPadding]
        // 3D padding = [leadingPadding, trailingPadding, leadingPadding, trailingPadding, ...]
    let windowDimensions = new Array(axes.size()).fill([windowLength]).flat();
        // 1D windowDimensions = [windowLength]
        // 2D windowDimensions = [windowLength, windowLength]
        // 3D windowDimensions = [windowLength, windowLength, windowLength]

    let regionAverages = averagePoolND(pow(input, 2), axes, windowsDimensions, padding);
    output = input / pow((regionAverages * scale + bias), exponent);
}

Where averagePoolND is the more general pooling function form that simply takes axes directly (like the related reduction functions), rather than implied rightmost dimensions like averagePool2D (which is a subset, where averagePool2D(input, ...) = averagePoolND(input, axes = [input.rank - 2, input.rank - 1], ...)). Conversely, averagePoolND with two axes can be implemented via existing implementation's limited averagePool2D via transposes.

function averagePoolND(input, axes, ...)
{
    // e.g. Given input rank=4 and axes=[1], returns [0,2,3,1].
    //      Given input rank=3 and axes=[0,1], returns [2,0,1].
    let permutation = GetPermutationToRightmostAxes(input.rank, axes);
    let inversePermutation = GetInversePermutation(axes);
    let poolingOperator;
    switch (axes.size())
    {
    case 1: poolingOperator = averagePool1D; break;
    case 1: poolingOperator = averagePool2D; break;
    default: throw ... // Unsupported axis count
    }
    return transpose(poolingOperator(transpose(input, permutation), inversePermutation);
}

Note if you only have averagePool2D to work with (WebNN lacks an averagePool1D), then you can just set the padding for the first dimension to [0,0,*,*] and windowDimensions = [1,*].

Implementations

Implementations consistently:

have a scaling parameter, beta, and bias.
perform the equation in the same order

Implementations differ in:

their default values for the scaling, exponent, and bias.
how many axes they support, either 1 or 2. Although 3 axes would be a natural logical continuation, I haven't actually seen an implementation that directly accepts 3.
the exact sizes they support. Most support any positive window dimension length (1,2,3,4...), but TF only supports odd sizes (1,3,5,7...). None directly support a windowsDimensions parameter like pooling, only a window length (effectively limiting the windowDimensions to squares for 2D cases).
the minimum input dimension count (1, 2, or 3).
how they treat edges, whether to repeat edge values or pad as zeros.

API/Library	Input rank	Axes	Padding	Kernel size	Defaults
TensorFlow	?	1D `[rank-1]`	edge repeat	radius * 2 + 1	s=1 e=0.5 b=1
PyTorch	>=2D	1D `[1]`	zeros	square length	s=.0001 e=0.75 b=1
CoreML	>=3D	1D `[rank-3]`	zeros	square length	s=.0001 e=0.75 b=1
Caffe	>=3D	1D `[1]` / 2D `[rank-2, rank-1]`	zeros	square length	s=1 e=0.75 b=NA
NCNN	?	1D `[1]` / 2D `[rank-2, rank-1]`	zeros?	square length	s=1 e=0.75 b=1
ONNX	>=2D	1D `[1]`	zeros	square length	s=.0001 e=0.75 b=1
DirectML	4D	1D `[1]` / 2D `[2,3]`	zeros	square length	s=.0001 e=0.75 b=1

CoreML (1D normalization)

2D [a,_,1] axes=[0] // rank - 3. Append ones for trailing dimensions since minimum rank 3 requirement.
3D [a,_,_] axes=[0]
4D [_,a,_,_] axes=[1]
5D [_,_,a,_,_] axes=[2]

TensorFlow (1D normalization)

2D [_,a] axes=[1] // rank - 1
3D [_,_,a] axes=[2]
4D [_,_,_,a] axes=[3]
5D [_,_,_,_,a] axes=[4]

PyTorch or Caffe or NCNN or ONNX (1D normalization)

2D [_,a] axes=[1]
3D [_,a,_] axes=[1]
4D [_,a,_,_] axes=[1]
5D [_,a,_,_,_] axes=[1]

DirectML (1D normalization)

2D [_,a,1,1] axes=[1] // Append ones for trailing dimensions.
3D [_,a,_,1] axes=[1]
4D [_,a,_,_] axes=[1]
5D [*,a,_,_] axes=[1] // Flatten extraleading dimensions.

Caffe and NCNN (2D normalization)

2D [a,a] axes=[0,1] // rank - 2, rank - 1
3D [_,a,a] axes=[1,2]
4D [_,_,a,a] axes=[2,3]
5D [_,_,_,a,a] axes=[3,4]

DirectML (2D normalization)

2D [1,1,a,a] axes=[2,3] // rank - 2, rank - 1. Append ones for leading dimensions.
3D [1,_,a,a] axes=[2,3]
4D [_,_,a,a] axes=[2,3]
5D [*,_,a,a] axes=[2,3] // Flatten extra leading dimensions.

Possible IDL

partial interface MLGraphBuilder {
  ...
  MLOperand batchNormalization(MLOperand input, MLOperand mean, MLOperand variance, optional MLBatchNormalizationOptions options = {});
  MLOperand instanceNormalization(MLOperand input, optional MLInstanceNormalizationOptions options = {});
  MLOperand layerNormalization(MLOperand input, optional MLLayerNormalizationOptions options = {});
+ MLOperand localResponseNormalization(MLOperand input, optional MLLocalResponseNormalizationOptions options = {});
  ...
};

+dictionary MLLocalResponseNormalizationOptions {
+  sequence<unsigned long> axes;
+  unsigned long windowLength; // 1 up to input size or more
+  float scale = 1.0;      // Sometimes labeled alpha.
+  float bias = 1.0;       // Sometimes labeled k
+  float exponent = 0.5;   // Sometimes labeled beta.
+};

Data Types

float16	float32

webmachinelearning / webnn