webmachinelearning / webnn

🧠 Web Neural Network API
https://www.w3.org/TR/webnn/
Other
367 stars 45 forks source link

Support for LocalResponseNormalization (LRN) operation #228

Open MarkGHX opened 2 years ago

MarkGHX commented 2 years ago

Hi all,

I'm a student intern working on GSoC 2021 project "OpenCV.js: Accelerate OpenCV.js DNN via WebNN" (opencv/opencv#20406). Here is the proposal. In this project, I will improve the performance of OpenCV.js DNN Module using WebNN.

Here is a brief result of the improvements: Model OpenCV.js wasm OpenCV.js wasm+simd+threads OpenCV native default OpenCV OpenVINO OpenCV WebNN OpenCV.js WebNN-polyfill OpenCV.js WebNN-Electron
GoogleNet 825.07ms 51.55ms 29.32ms 10.35ms 24.8ms 69.15ms 24.90ms
SqueezeNet 462.12ms 31.69ms 17.4ms 4.29ms 4.56ms 21.27ms 4.07ms

However, I found that there is a performance gap of GoogleNet between OpenCV OpenVINO and OpenCV WebNN, while this gap doesn't exist for SqueezeNet. This is mainly because LRN layer is not supported by WebNN. Thus, the GoogleNet is divided into four parts, which slows down the inference speed. After a further investigation, I found that both ONNX (link) and TFLite (link) support LRN and both GoogleNet and AlexNet need LRN layer. Thus, I think it is useful for WebNN to support this frequently used LRN op.

fdwr commented 1 month ago

I'll amend this comment with further justification and more comprehensive API comparisons later...

I prototyped localResponseNormalization here, which worked for my limited use. Though, between the various implementations (TensorFlow, CoreML, DirectML, Caffe, PyTorch), there are enough little differences (kernel symmetry, which axes are used, which dimensions are windowed...) that the final WebNN form probably ought to be a more generic form taking axes rather than just axis, and windowSize rather than a radius; and it warrants a table with a clear mapping to each backend.

Known Models

Behavior

Local response normalization produces an output the same size as the input, using a sliding window where each output element equals the corresponding input element divided by an adjusted averaged window around it. The shape of that sliding window can vary in size and rank, along a single axis or more. Although not obvious at first, the operator is really a variation of pooling, with the general form:

function localResponseNormalization(input, axes, windowLength, scale, bias, exponent)
{
    let leadingPadding = floor((windowLength - 1) / 2); // Center halfway around sliding window
    let trailingPadding = ceil((windowLength - 1) / 2); // Center halfway around sliding window
    let padding = new Array(axes.size() * 2).fill([leadingPadding, trailingPadding]).flat();
        // 1D padding = [leadingPadding, trailingPadding]
        // 2D padding = [leadingPadding, trailingPadding, leadingPadding, trailingPadding]
        // 3D padding = [leadingPadding, trailingPadding, leadingPadding, trailingPadding, ...]
    let windowDimensions = new Array(axes.size()).fill([windowLength]).flat();
        // 1D windowDimensions = [windowLength]
        // 2D windowDimensions = [windowLength, windowLength]
        // 3D windowDimensions = [windowLength, windowLength, windowLength]

    let regionAverages = averagePoolND(pow(input, 2), axes, windowsDimensions, padding);
    output = input / pow((regionAverages * scale + bias), exponent);
}

Where averagePoolND is the more general pooling function form that simply takes axes directly (like the related reduction functions), rather than implied rightmost dimensions like averagePool2D (which is a subset, where averagePool2D(input, ...) = averagePoolND(input, axes = [input.rank - 2, input.rank - 1], ...)). Conversely, averagePoolND with two axes can be implemented via existing implementation's limited averagePool2D via transposes.

function averagePoolND(input, axes, ...)
{
    // e.g. Given input rank=4 and axes=[1], returns [0,2,3,1].
    //      Given input rank=3 and axes=[0,1], returns [2,0,1].
    let permutation = GetPermutationToRightmostAxes(input.rank, axes);
    let inversePermutation = GetInversePermutation(axes);
    let poolingOperator;
    switch (axes.size())
    {
    case 1: poolingOperator = averagePool1D; break;
    case 1: poolingOperator = averagePool2D; break;
    default: throw ... // Unsupported axis count
    }
    return transpose(poolingOperator(transpose(input, permutation), inversePermutation);
}

Note if you only have averagePool2D to work with (WebNN lacks an averagePool1D), then you can just set the padding for the first dimension to [0,0,*,*] and windowDimensions = [1,*].

Implementations

Implementations consistently:

Implementations differ in:

API/Library Input rank Axes Padding Kernel size Defaults
TensorFlow ? 1D [rank-1] edge repeat radius * 2 + 1 s=1 e=0.5 b=1
PyTorch >=2D 1D [1] zeros square length s=.0001 e=0.75 b=1
CoreML >=3D 1D [rank-3] zeros square length s=.0001 e=0.75 b=1
Caffe >=3D 1D [1] / 2D [rank-2, rank-1] zeros square length s=1 e=0.75 b=NA
NCNN ? 1D [1] / 2D [rank-2, rank-1] zeros? square length s=1 e=0.75 b=1
ONNX >=2D 1D [1] zeros square length s=.0001 e=0.75 b=1
DirectML 4D 1D [1] / 2D [2,3] zeros square length s=.0001 e=0.75 b=1

CoreML (1D normalization)

TensorFlow (1D normalization)

PyTorch or Caffe or NCNN or ONNX (1D normalization)

DirectML (1D normalization)

Caffe and NCNN (2D normalization)

DirectML (2D normalization)

Possible IDL

partial interface MLGraphBuilder {
  ...
  MLOperand batchNormalization(MLOperand input, MLOperand mean, MLOperand variance, optional MLBatchNormalizationOptions options = {});
  MLOperand instanceNormalization(MLOperand input, optional MLInstanceNormalizationOptions options = {});
  MLOperand layerNormalization(MLOperand input, optional MLLayerNormalizationOptions options = {});
+ MLOperand localResponseNormalization(MLOperand input, optional MLLocalResponseNormalizationOptions options = {});
  ...
};
+dictionary MLLocalResponseNormalizationOptions {
+  sequence<unsigned long> axes;
+  unsigned long windowLength; // 1 up to input size or more
+  float scale = 1.0;      // Sometimes labeled alpha.
+  float bias = 1.0;       // Sometimes labeled k
+  float exponent = 0.5;   // Sometimes labeled beta.
+};

Data Types

float16 float32