xboot / libonnx

A lightweight, portable pure C99 onnx inference engine for embedded devices with hardware acceleration support.
MIT License
560 stars 101 forks source link

Maxpool + dilation #24

Open folkertdev opened 2 years ago

folkertdev commented 2 years ago

This is really a question, I don't think there is a bug here, just something I'm not understanding.

I'm looking at the code for maxpool and how it handles dilations. The spec has this example:

"""
input_shape: [1, 1, 4, 4]
output_shape: [1, 1, 2, 2]
"""
node = onnx.helper.make_node(
    'MaxPool',
    inputs=['x'],
    outputs=['y'],
    kernel_shape=[2, 2],
    strides=[1, 1],
    dilations=[2, 2]
)
x = np.array([[[
    [1, 2, 3, 4],
    [5, 6, 7, 8],
    [9, 10, 11, 12],
    [13, 14, 15, 16],
]]]).astype(np.float32)
y = np.array([[[
    [11, 12],
    [15, 16]]]]).astype(np.float32)

expect(node, inputs=[x], outputs=[y], name='test_maxpool_2d_dilations')

This should implicitly use AUTO_PAD_NOTSET. Now what I tried is getting the MaxPool_float32 to give the [ 11, 12, 15, 16 ] result by hardcoding the inputs, for the full code + output see this godbolt:

int strides[] = { 1, 1 };
int kernels[] = { 2, 2 };
int cpads[] = { 0, 0, 0, 0 };

int x_ndim = 4;
int x_dims[] = { 1, 1, 4, 4 };
int y_dims[] = { 1, 1, 2, 2 };

From my code reading, the dilation is only used to determine the output dimensions, which I've hardcoded here.

But with these inputs I get the incorrect output:

6.000000 7.000000 10.000000 11.000000

So, what is the way that dilations influence the end result that I am missing?