[Good First Issue]: Support aten::avg_poolNd and aten::max_poolNd with no batch

mvafin commented 1 year ago

Context

OpenVINO component responsible for support of PyTorch models is called as PyTorch Frontend (PT FE). PT FE converts a model represented as TorchScript model to a model in OpenVINO opset.

What needs to be done?

Modify existing conversion rule for the pooling operations: https://github.com/openvinotoolkit/openvino/blob/master/src/frontends/pytorch/src/op/avg_poolnd.cpp and https://github.com/openvinotoolkit/openvino/blob/master/src/frontends/pytorch/src/op/max_poolnd.cpp.
Openvino MaxPool and AvgPool operations assume that Batch and Channels dimensions exist in input. In order to support pytorch operations inputs that have no batch should be reshaped to have batch 1 and reshaped back after.
Implement operation tests in tests/layer_tests/pytorch_tests. Please consider different data types, but keep reasonable number of test cases

Example Pull Requests

No response

Resources

Contact points

@openvinotoolkit/openvino-pytorch-frontend-maintainers

Ticket

No response

ahmadchalhoub commented 11 months ago

.take

github-actions[bot] commented 11 months ago

Thank you for looking into this issue! Please let us know if you have any questions or require any help.

p-wysocki commented 11 months ago

Hello @ahmadchalhoub! Thank you for taking a look, please let us know if you have any questions. Just yesterday our CONTRIBUTING.md has been updated with a technical guide - I highly recommend checking it out. :)

ahmadchalhoub commented 11 months ago

@p-wysocki Thanks for your input! I'll definitely take a look at the updates :). I already started working on the PR for this but it might take me a few days.

p-wysocki commented 10 months ago

Hello @ahmadchalhoub, do you need any help? Are you still working on that issue?

I am happy to announce that we have created a channel dedicated to Good First Issues support on our Intel DevHub Discord server! Join it to receive support, engage in discussions, ask questions and talk to OpenVINO developers.

ahmadchalhoub commented 10 months ago

Hi @p-wysocki! Very sorry for the delay; I got caught up with some things and will not be able to work on this issue. I will un-assign myself so someone else can pick it up :)

p-wysocki commented 10 months ago

No worries, you're welcome to pick up another issue anytime. :)

Sar2580P commented 10 months ago

@p-wysocki I am new to open-source, could you please provide some guidance relevant to above issue? I want to get started, I think task is just to reshape the input twice, but need some code understanding...

mlukasze commented 10 months ago

@mmikolajcz could you help here?

p-wysocki commented 10 months ago

Hello @Sar2580P, you can start with our technical guide in CONTRIBUTING.md :)

You're also welcome to join our Intel DevHub Discord server, where you can receive chat support from OpenVINO developers.

mvafin commented 10 months ago

@p-wysocki I am new to open-source, could you please provide some guidance relevant to above issue? I want to get started, I think task is just to reshape the input twice, but need some code understanding...

That is exactly right, you need to reshape input 2 times, but the main problem is to calculate the shape to reshape to, when input shape is only know in runtime.

SpaceDigi commented 10 months ago

I am very sorry, but do you know the deadline for this bug?

krish1209 commented 9 months ago

.take

github-actions[bot] commented 9 months ago

Thank you for looking into this issue! Please let us know if you have any questions or require any help.

krish1209 commented 9 months ago

Hi. I am new to Open source contributing but I am excited to start as I am too. I am going to participate in GSOC 24 too and choosing OpenVino organization as well. Currently I'm pursuing a Bachelor's degree in CSE AI and Machine Learning and have also some experience in Deep learning and neural networks as well.

The problem here is that we need a batch size initialization and channel here refers to the image components of an image. E.g RGB, CMY, Grayscale etc. Here in CNN it typically means the depth dimension of the input tensor.

For OpenVino's MaxPool and AvgPool operations it assumes batch and channels dimensions exist. But for images that don't have it in input need to be assigned batch as 1 and then after operation get back to its original size.

In the above code given we can try making a block of code which can check whether the image has a batch or not and if not then we assign batch as 1.

I am writing a code here since i dont know if there is a seperate section for providing codes as solutions And if any mentor would help me in how to use pull requests and how to know if you solve an issue, it would be really helpful :)

In this repo, we can try the following code: (This is my first open source contribution please be kind :) ) https://github.com/openvinotoolkit/openvino/blob/master/src/frontends/pytorch/src/op/avg_poolnd.cpp

OutputVector translate_avg_poolnd(const NodeContext& context) { // Same code as above link onlu

auto input = context.get_input(0);

// Here we check if the image tensor has a batch dimension. We check if the rank (number of dimensions) of the input tensor is known to us at compile time or not. and we then check its length whether it is less than 3 or not
if (input.get_partial_shape().rank().is_static() && input.get_partial_shape().rank().get_length() < 3) {
    // now we add a batch size of 1
    auto batch_size = context.mark_node(v0::Constant::create(element::i32, Shape{}, {1}));
    input = context.mark_node(std::make_shared<v1::Broadcast>(input, batch_size));
}

// here rest of the code is same as the github repo link

return {context.mark_node(
    std::make_shared<v1::AvgPool>(input, strides, pads, pads, kernel, !count_include_pad, rounding_type))};

}

Please feel free to guide me more and notify me if anything comes up.

mvafin commented 9 months ago

@krish1209 Please check contribution guide for tips how to contribute and create Pull Requests.

The code you provided will not actually work as you expect, because Broadcast accepts shape to broadcast to, when you provide 1 you ask to broacast tensor to scalar, which should fail. I would recommend you use Unsqueeze operation, but the main difficulty in this task is the fact that pooling can be done for any shape NCH, CH, NCHW, CHW, NCHWD, CHWD, you may want to use information about which torch operation is used, e.g aten::max_pool2d or aten::max_pool1d, which will help you understanding what dimensions are required to get pooled.

Also please write a test for all these cases first and make sure your solution passes all tests.

If you have any questions please feel free to ask.

krish1209 commented 9 months ago

Okay, I'll spin my head around it. Meanwhile could you let me know like how are you thinking of approaching this ?

krish1209 commented 9 months ago

As of now, I am not aware if how to write a test for these cases but I'll learn how.

Is it possible we could make cases with the if else statements for two operations? like aten::max_pool2d and aten::max_pool1d?

For example:

if (context.get_op_type() == "aten::max_pool2d") { unsqueezed_input = context.mark_node(v1::Unsqueeze(unsqueezed_input, Shape{3})); } else (context.get_op_type() == "aten::max_pool1d") { unsqueezed_input = context.mark_node(v1::Unsqueeze(unsqueezed_input, Shape{2})); }

Similarly for 3d we get shape{4} for 4d, we get shape{5}

Also could you please provide with the link to discord, the previous link is not working.

krish1209 commented 9 months ago

Please let me know the status of the problem and if there is suggested or recommended issues I can get my hands on. Thank you so much

p-wysocki commented 9 months ago

Thank you for your patience, and sorry for the late reply.

Is it possible we could make cases with the if else statements for two operations? like aten::max_pool2d and aten::max_pool1d?

@mvafin could you take a look?

Also could you please provide with the link to discord

The link has been updated in the CONTRIBUTING.md - Intel DevHub Discord server.

p-wysocki commented 8 months ago

Hello @krish1209, are you still willing to contribute?

@mvafin could you please take a look at the latest question if @krish1209 is still here?

krish1209 commented 8 months ago

Yes yes I am willing to contribute. Please let me know

mvafin commented 8 months ago

@krish1209 Yes, but instead of if-else it is better to create a common function and functions that will call it with number of dimensions. For example this is done for adaptive_pool https://github.com/openvinotoolkit/openvino/blob/master/src/frontends/pytorch/src/op/adaptive_poolnd.cpp

krish1209 commented 8 months ago

I'll look into it, Got a little busy with my college projects. Sorry Will get back to you!

krish1209 commented 8 months ago

Based on the code written for adaptive_pooling, I have written the same common functions for max_pool and avg_pool operations. I will be providing the code with a little explanation and if it is seen fit, Please allow me to create a pull request based on it as it would also help me towards GSOC :). So this is the code, please do check it and guide me if it is wrong or can be done better.

// This is the common function for defining the max pool operation OutputVector translate_max_pool_base(const NodeContext& context, const Output& input, const Shape& kernel, const Strides& strides = Strides{}, const Shape& pads = Shape{}, const Strides& dilations = Strides{}, RoundingType rounding_type = RoundingType::FLOOR);

// For 1D Max Pooling OutputVector translate_max_pool1d(const NodeContext& context) { const auto kernel_shape = Shape{context.get_single_value(context.get_input_shape(1))}; return translate_max_pool_base(context, context.get_input(0), kernel_shape); }

// For 2D Max Pooling OutputVector translate_max_pool2d(const NodeContext& context) { const auto kernel_h = context.get_single_value(context.get_input_shape(1)); const auto kernel_w = context.get_single_value(context.get_input_shape(2)); const auto kernel_shape = Shape{kernel_h, kernel_w}; // Optional arguments (strides, pads, dilations) can be handled here const Strides strides = context.get_input_size() > 5 ? context.get_single_value<std::vector>(context.get_input_shape(5)) : Strides{}; const Shape pads = context.get_input_size() > 3 ? context.get_single_value<std::vector>(context.get_input_shape(3)) : Shape{}; const Strides dilations = context.get_input_size() > 4 ? context.get_single_value<std::vector>(context.get_input_shape(4)) : Strides{}; return translate_max_pool_base(context, context.get_input(0), kernel_shape, strides, pads, dilations); }

// For 3D Max Pooling (similar to 2D but with extra dimension) OutputVector translate_max_pool3d(const NodeContext& context) { const auto kernel_d = context.get_single_value(context.get_input_shape(1)); const auto kernel_h = context.get_single_value(context.get_input_shape(2)); const auto kernel_w = context.get_single_value(context.get_input_shape(3)); const auto kernel_shape = Shape{kernel_d, kernel_h, kernel_w}; return translate_max_pool_base(context, context.get_input(0), kernel_shape); }

mvafin commented 8 months ago

@krish1209 You seem on the right track, but there are some issues you need to pay attension to:

please note that there is no context.get_single_value in PyTorch pytorch::NodeContext, I think you want to use context.const_input.
you should avoid using context.get_input_shape because it is not guaranteed to return a static shape, for example for scripting case it will return fully dynamic shape with dynamic rank.
the approach that was used for adaptive pooling wasn't good and it will not work here, since you need to reshape input first CHW->1CHW and after pooling remove that extra dimension you inserted. I would recommend you pass number of pooling dimensions to the base function and do Reshape to the shape [-1,0,0,0] (pool2d case) with special zero (0 means to not change dimension). To reshape back you can use Slice to extract shape_before_pool[:-2] and shape_after_pool[-2:] (2d case) and concat those two shapes to create reshape after.

krish1209 commented 7 months ago

So i tried for this, considering the scripting case as well as needing to reshape input first and again removing the extra dimension we inserted. Translate_max_pool_base function to include an additional parameter 'pooling_dims'. This parameter will indicate the number of pooling dimensions, allowing for flexibility in specifying whether the pooling operation is 1D, 2D, or 3D. Then to address the problem in the previous code that is: if the input tensor's dimensions do not correspond to the pooling dimensions (for e.g adding a batch dimension), so for that pooling dimension is explicitly passed as an argument to the function translate_max_pool_base So this is a glimpse of how it would look like

OutputVector translate_max_pool_base(const NodeContext& context, const Output& input,
                                     const Shape& kernel, const Strides& strides = Strides{},
                                     const Shape& pads = Shape{}, const Strides& dilations = Strides{},
                                     RoundingType rounding_type = RoundingType::FLOOR,
                                     int pooling_dims = 2);  // Added parameter for pooling dimensions

OutputVector translate_max_pool1d(const NodeContext& context) {
    const auto kernel_shape = Shape{context.const_input(1).to_vector<int64_t>()};
    return translate_max_pool_base(context, context.const_input(0), kernel_shape, {}, {}, {}, RoundingType::FLOOR, 1); // Pooling dimension set for 1D
}

OutputVector translate_max_pool2d(const NodeContext& context) {
    const auto kernel_shape = Shape{context.const_input(1).to_vector<int64_t>()};
    const auto input_size = context.const_input(0).sizes();
    const int64_t num_dims = input_size.size();

    Strides strides, pads, dilations;
    if (num_dims > 3) {
        strides = Strides{context.const_input(5).to_vector<int64_t>()};
        pads = Strides{context.const_input(3).to_vector<int64_t>()};
        dilations = Strides{context.const_input(4).to_vector<int64_t>()};
    }

    return translate_max_pool_base(context, context.const_input(0), kernel_shape, strides, pads, dilations, RoundingType::FLOOR, 2); // Pooling dimension set for 2D
}

OutputVector translate_max_pool3d(const NodeContext& context) {
    const auto kernel_shape = Shape{context.const_input(1).to_vector<int64_t>()};
    return translate_max_pool_base(context, context.const_input(0), kernel_shape, {}, {}, {}, RoundingType::FLOOR, 3); // Pooling dimension set for 3D
}

One thing is that it assumes that kernel shape is provided as a vector of int64_t values.

Please guide me further as I am learning more and more slowly :)

mvafin commented 7 months ago

@krish1209 That looks okay, the only thing I don't understand why you do an extra step for 2d case when num_dims > 3? And main difficulty will be implementing the base function.

krish1209 commented 7 months ago

It is to cover for the case if the pooling operation is 2d (channels, height, width), additional dimensions might be present such as batch dimensions. The extra step can handle the optional arguments for strides, pads, and dilations.

To correctly handle the optional arguments for the pooling operation, the above code checks if num_dims is greater than 3 and it will extract the values for strides, pads, and dilations from the context's input tensors accordingly. This will make sure pooling operation is performed correctly even if the input tensor has additional dimensions beyond the expected 2D shape.

I will then proceed to implement the base function. If this is fine.

cannguyen275 commented 6 months ago

Hello guys, I'm facing this issue and looking forward to fixing it. My MaxPool3D layer is causing the issue:

nn.MaxPool3d(kernel_size=(1, 3, 3), stride=(1, 1, 1))

Are there any ways to escape this issue instead of waiting for new updates?

mvafin commented 6 months ago

@cannguyen275 you could modify the model code to unsqueeze before pool and squeeze after, so you will run pooling in NCHWD mode instead of in CHWD

tianyiSKY1 commented 2 months ago

.take

github-actions[bot] commented 2 months ago

Thank you for looking into this issue! Please let us know if you have any questions or require any help.

tianyiSKY1 commented 2 months ago

@mlukasze Hello, I’m new to open source. Could you help me understand how avg_poolnd.cpp and max_poolnd.cpp handle inputs of varying dimensions?

p-wysocki commented 2 months ago

Hello @tianyiSKY1! Thank you for taking a look at the task.

In the files you mentioned the inputs are retrieved using auto input = context.get_input(0); and being passed directly into the operator constructor: https://github.com/openvinotoolkit/openvino/blob/41394bb0e6a0fd4a5e77e855c529d1ff0d83e61d/src/frontends/pytorch/src/op/max_poolnd.cpp#L56

As for inputs of varying dimensions you're most likely talking about PartialShapes. PartialShape is a shape which can have a dynamic rank (we know pretty much nothing about the data) or static rank. In case of static rank, the shape is a collection of Dimension objects.

They can be static (for example {5}), dynamic ({?}) or an interval ({3, 6}). In this task I suppose the case you mostly care about is whether the input is of static rank, and what the rank is. In practice when the graph has a dynamic input shape, we make it static during runtime, once the input data is given and its shape is known. Since this task regards a Frontend it's not something that should concern you.

In the snippet in https://github.com/openvinotoolkit/openvino/issues/20927#issuecomment-1893512776 you can see how to access the input's PartialShape and rank.

Additionally, it's worth checking out @mvafin's very informative comment about handling different data layouts: https://github.com/openvinotoolkit/openvino/issues/20927#issuecomment-1893529281.

Please let us know if you have any other questions.

tianyiSKY1 commented 2 months ago

@p-wysocki Hi, I conducted a preliminary test and wrote some test code. However, every time I input a case for the CHWD dimension, the test fails with an 'is_conversion_successful' error. Is there a part of the code that autommatically detects the input shape, which might be preventing me from proceeding with further testing?

class TestAvgPool3D(PytorchLayerTest):

    def _prepare_input(self):
        return (self.input_tensor, )

    def create_model(self, kernel_size, stride, padding, ceil_mode=True, count_include_pad=True):
        class aten_avg_pool3d(torch.nn.Module):

            def __init__(self, kernel_size, stride, padding, ceil_mode, count_include_pad) -> None:
                super().__init__()
                self.kernel_size = kernel_size
                print(kernel_size)
                self.stride = stride
                print(stride)
                self.padding = padding
                self.ceil_mode = ceil_mode
                self.count_include_pad = count_include_pad

            def forward(self, input_tensor):
                return torch.nn.functional.avg_pool3d(input_tensor, self.kernel_size, self.stride, self.padding, self.ceil_mode,
                                                      self.count_include_pad)
        ref_net = None

        return aten_avg_pool3d(kernel_size, stride, padding, ceil_mode=True, count_include_pad=True), ref_net, "aten::avg_pool3d"

    @pytest.mark.parametrize('input_shape', [[1, 3, 15, 15, 15], [3, 15, 15, 15]])
    @pytest.mark.parametrize("params", d3_params)
    @pytest.mark.parametrize("ceil_mode", [True, False])
    @pytest.mark.parametrize("count_include_pad", [True, False])
    @pytest.mark.nightly
    @pytest.mark.precommit
    @pytest.mark.precommit_torch_export
    @pytest.mark.precommit_fx_backend
    def test_avg_pool3d(self, params, ceil_mode, count_include_pad, ie_device, precision, ir_version, input_shape):
        self.input_tensor = np.random.randn(*input_shape).astype(np.float32)
        self._test(*self.create_model(**params, ceil_mode=ceil_mode, count_include_pad=count_include_pad),
                   ie_device, precision, ir_version, trace_model=True, dynamic_shapes=False)

FAILED test_avg_pool.py::TestAvgPool3D::test_avg_pool3d[ ie_device:CPU - precision:FP16 - count_include_pad:False - ceil_mode:False - params:{'kernel_size': [3, 2, 1], 'stride': None, 'padding': [0, 0, 0]} - input_shape:[3, 15, 15, 15] ] - openvino._pyopenvino.OpConversionFailure: Check 'is_conversion_successful' failed at src/frontends/pytorch/src/frontend.cpp:166:

p-wysocki commented 2 months ago

Judging by the check it seems the operator for some reason hasn't been converted: https://github.com/openvinotoolkit/openvino/blob/69180e2fc9ee68ae6705b68c934d75f2a6182398/src/frontends/pytorch/src/frontend.cpp#L172-L173

I'd use a debugger to see what's going on in the function during the test exactly, with the breakpoint at the beginning of the conversion function. Perhaps there's some assert the operator doesn't pass and the conversion function fails?

tianyiSKY1 commented 2 months ago

hi @p-wysocki I found places that prevented me from converting models: https://github.com/openvinotoolkit/openvino/blob/91b57448952799245aca9824e1f1c7c3eb3466bc/src/core/shape_inference/include/pooling_shape_inference_util.hpp#L56-L59 I'm trying to force a model conversion by commenting out this code. I did a recompile after making the changes, but it still runs as the original code when I test it. I've tried deleting the \build file and recompiling, but it doesn't work.

E       Check 'data_rank.is_dynamic() || num_spatial == (data_shape.size() - spatial_dim_offset)' failed at src/core/shape_inference/include/pooling_shape_inference_util.hpp:56:
E       While validating node 'opset1::AvgPool AvgPool_156823 (opset1::Pad aten::avg_pool3d/Pad[0]:f32[?,?,?,?]) -> (dynamic[...])' with friendly_name 'AvgPool_156823':
E       Expected kernel size to be equal to input size - 2. Got: 3

mvafin commented 2 months ago

@tianyiSKY1 You shouldn't modify shape inference code. The problem that needs to be solved is reshaping of CHW to 1CHW input and the reshape of the output of the operation back by removing 1.

tianyiSKY1 commented 2 months ago

@mvafin I have tried reshaping the CHW to a 1CHW input and reshaping the output of the operation by removing the 1. But in the testing phase, the check for the input shape prevents the model conversion.

OutputVector translate_avg_pool3d(const NodeContext& context) {
    auto input = context.get_input(0);
    auto input_shape = input.get_partial_shape();

    auto kernel = context.const_input<Shape>(1);

    // Check if the input has a batch dimension
    bool has_batch_dim = input_shape.rank().is_static() && input_shape.rank().get_length() == 5;

    if (!has_batch_dim) {
        // If there is no batch dimension, add one by unsqueezing the input tensor
        auto zero = v0::Constant::create(element::i32, Shape{}, {0});
        auto unsqueezed_input = context.mark_node(std::make_shared<v0::Unsqueeze>(input, zero));
        input = unsqueezed_input;
    }

    auto pooled_output = translate_avg_pool_base(context, input);

    auto const_0 = context.mark_node(v0::Constant::create(element::i32, Shape{1}, {0}));
    auto const_1 = context.mark_node(v0::Constant::create(element::i32, Shape{1}, {1}));
    auto slice_end = context.mark_node(v0::Constant::create(element::i32, Shape{1}, {-4}));

    if (!has_batch_dim) {
        // If there was no batch dimension, remove the added dimension by slicing the output tensor
        auto sliced_output = context.mark_node(std::make_shared<v8::Slice>(pooled_output[0], const_0, slice_end, const_1, const_0));
        Output<Node> pooled_output = sliced_output;
    }

    return {pooled_output}; 
};

mvafin commented 2 months ago

Did you add dynamic_shape=False to test class?

Your solution uses get_partial_shape which is not-recommended, because it will not work for dynamic shapes.

Also you use Slice on pool output which is wrong, you need to Squeeze(0) or Reshape.

To make your solution working on dynamic shapes, you need to do something like this assuming 3d case: Before pool: Reshape(input, [-1,0,0,0,0], true) or calculate shape similarly as for output After pool: ShapeOf(input) -> Slice(0, -3, 1) ShapeOf(pooled_output) -> Slice(-3, -1, 1) Concat -> Reshape This should work for dynamic shape, but it will create additional nodes, so ideal would be to combine two solutions using if partial_shape.is_dynamic().

tianyiSKY1 commented 2 months ago

@mvafin Yes, I added dynamic_shape=False to test class. Thank you very much for your guidance, I will keep trying.

tianyiSKY1 commented 2 months ago

@mvafin Hi, I'm sorry to interrupt. Can you please guide me a little further? I tried to use Reshape to increase the batch dimension while maintaining the dynamic shape of the input, but this doesn't seem to work.

OutputVector translate_avg_pool3d(const NodeContext& context) {
    auto input = context.get_input(0);

    auto kernel = context.const_input<Shape>(1);

    // Check if the input has a batch dimension
    auto input_shape_of = context.mark_node(std::make_shared<v3::ShapeOf>(input));

    auto has_batch_dim = context.mark_node(std::make_shared<v1::Equal>(
        input_shape_of,  
        context.mark_node(v0::Constant::create(element::i64, Shape{1}, {5}))
    ));

    if (!has_batch_dim) {
        // If there is no batch dimension, add one by reshaping the input tensor
        auto reshape_pattern = context.mark_node(v0::Constant::create(element::i32, Shape{5}, {-1, 0, 0, 0, 0}));
        input = context.mark_node(std::make_shared<v1::Reshape>(input, reshape_pattern, true));
    }

    auto pooled_output = translate_avg_pool_base(context, input);

    if (!has_batch_dim) {
        // If there was no batch dimension, remove the added dimension by slicing the output tensor
        auto input_shape_of = context.mark_node(std::make_shared<v3::ShapeOf>(input));
        auto pooled_output_shape_of = context.mark_node(std::make_shared<v3::ShapeOf>(pooled_output[0]));

        auto slice_input_shape = context.mark_node(std::make_shared<v8::Slice>(input_shape_of, 
            context.mark_node(v0::Constant::create(element::i64, Shape{1}, {0})), 
            context.mark_node(v0::Constant::create(element::i64, Shape{1}, {-3})), 
            context.mark_node(v0::Constant::create(element::i64, Shape{1}, {1})), 
            context.mark_node(v0::Constant::create(element::i64, Shape{1}, {0}))));

        auto slice_pooled_output_shape = context.mark_node(std::make_shared<v8::Slice>(pooled_output_shape_of, 
            context.mark_node(v0::Constant::create(element::i64, Shape{1}, {-3})), 
            context.mark_node(v0::Constant::create(element::i64, Shape{1}, {-1})), 
            context.mark_node(v0::Constant::create(element::i64, Shape{1}, {1})), 
            context.mark_node(v0::Constant::create(element::i64, Shape{1}, {0}))));

        auto concat_shape = context.mark_node(std::make_shared<v0::Concat>(OutputVector{slice_input_shape, slice_pooled_output_shape}, 0));
        for (auto& node : pooled_output) {
            node = context.mark_node(std::make_shared<v1::Reshape>(node, concat_shape, true));
        }

    }

    return {pooled_output}; 
};

mvafin commented 2 months ago

@tianyiSKY1 No worries, thank you for working on this issue.

You seem to misunderstand how operations work. std::make_shared<v1::Equal> will return operation, not result of the comparison. So !has_batch_dim will always be false. So for dynamic case you have to have same code for NCHW case and CHW, that is why I have wrote about inefficiency of such approach, it will create more nodes then we have today for operations that didn't need that before. That is why I recommend having different if branches for partial_shape.is_static() cases (true - static and false - dynamic) to optimize this for static case when we are sure if we need to reshape or not.

As for the code under the if it seems right in concept, but it needs to be tested.

tianyiSKY1 commented 2 months ago

@mvafin Hi, I followed your instructions and tried again. It even directly expands all input tensor without judgment to directly expand the batch dimensions. But it still doesn't work. Did I fail to expand the dimension?

OutputVector translate_avg_pool3d(const NodeContext& context) {
    auto input = context.get_input(0);

    auto kernel = context.const_input<Shape>(1);

    // Check if the input has a batch dimension
    auto input_shape = input.get_partial_shape();

    if (input_shape.is_static()) {
        bool has_batch_dim = input_shape.rank().get_length() == 5;
        if (!has_batch_dim) {
            // Reshape input to add a batch dimension
            auto reshape_pattern = context.mark_node(v0::Constant::create(element::i64, Shape{5}, {-1, 0, 0, 0, 0}));
            input = context.mark_node(std::make_shared<v1::Reshape>(input, reshape_pattern, true));
        }
    } else {
        // If the input shape is dynamic, add a batch dimension by default
        auto reshape_pattern = context.mark_node(v0::Constant::create(element::i64, Shape{5}, {-1, 0, 0, 0, 0}));
        input = context.mark_node(std::make_shared<v1::Reshape>(input, reshape_pattern, true));
    }

    auto pooled_output = translate_avg_pool_base(context, input);

    // If there was no batch dimension, remove the added dimension by slicing the output tensor
    auto input_shape_of = context.mark_node(std::make_shared<v3::ShapeOf>(input));
    auto pooled_output_shape_of = context.mark_node(std::make_shared<v3::ShapeOf>(pooled_output[0]));

    auto slice_input_shape = context.mark_node(std::make_shared<v8::Slice>(input_shape_of, 
        context.mark_node(v0::Constant::create(element::i64, Shape{1}, {0})), 
        context.mark_node(v0::Constant::create(element::i64, Shape{1}, {-3})), 
        context.mark_node(v0::Constant::create(element::i64, Shape{1}, {1})), 
        context.mark_node(v0::Constant::create(element::i64, Shape{1}, {0}))));

    auto slice_pooled_output_shape = context.mark_node(std::make_shared<v8::Slice>(pooled_output_shape_of, 
        context.mark_node(v0::Constant::create(element::i64, Shape{1}, {-3})), 
        context.mark_node(v0::Constant::create(element::i64, Shape{1}, {-1})), 
        context.mark_node(v0::Constant::create(element::i64, Shape{1}, {1})), 
        context.mark_node(v0::Constant::create(element::i64, Shape{1}, {0}))));

    auto concat_shape = context.mark_node(std::make_shared<v0::Concat>(OutputVector{slice_input_shape, slice_pooled_output_shape}, 0));
    for (auto& node : pooled_output) {
        node = context.mark_node(std::make_shared<v1::Reshape>(node, concat_shape, true));
    }

    return pooled_output; 
};

self = <test_avg_pool.TestAvgPool3D object at 0x7f800871b730>
om = <Model: 'Model58'
inputs[
<ConstOutput: names[input_tensor] shape[?,?,?,?] type: f32>
]
outputs[
<ConstOutput: names[] shape[?,?,?,?,?] type: dynamic>
]>
ov_inputs = [array([[[[-0.44533446,  1.8727436 , -0.47645602, ..., -0.37609327,
           0.87295103, -1.6756757 ],
         [-0....      [ 1.0659894 , -0.909589  , -0.8698702 , ...,  1.6324621 ,
          -2.3695803 , -0.50536054]]]], dtype=float32)]
dynamic_shapes = True

    def _resolve_input_shape_dtype(self, om, ov_inputs, dynamic_shapes):
        params = list(om.inputs)
        for i in range(len(ov_inputs)):
            inp = ov_inputs[i]
            if isinstance(inp, list):
                ov_inputs[i] = np.array(inp)
                if ov_inputs[i].dtype == np.int64:
                    ov_inputs[i] = ov_inputs[i].astype(np.int32)
                inp = ov_inputs[i]
            assert inp.dtype.name in self._type_map, f"Unknown type {inp.dtype}."
            if params[i].get_node().get_element_type().is_dynamic():
                params[i].get_node().set_element_type(
                    self._type_map[inp.dtype.name])
            shape = [-1] * len(inp.shape) if dynamic_shapes else inp.shape
            params[i].get_node().set_partial_shape(PartialShape(shape))
>       om.validate_nodes_and_infer_types()
E       RuntimeError: Check 'data_rank.is_dynamic() || num_spatial == (data_shape.size() - spatial_dim_offset)' failed at src/core/shape_inference/include/pooling_shape_inference_util.hpp:56:
E       While validating node 'opset1::AvgPool aten::avg_pool3d/AvgPool (opset1::Pad aten::avg_pool3d/Pad[0]:f32[?,2..,2..,2..]) -> (dynamic[?,?,?,?,?])' with friendly_name 'aten::avg_pool3d/AvgPool':
E       Expected kernel size to be equal to input size - 2. Got: 3

pytorch_layer_test_class.py:262: RuntimeError

tianyiSKY1 commented 1 month ago

@mvafin Hello, I was wondering if there is a way to view the variables in avg_poolnd.cpp using std::cout during pytest testing.

mlukasze commented 1 month ago

@mvafin gently ping

mvafin commented 1 month ago

@tianyiSKY1 From what I can see your code should work. You can use std::cout in the code to debug, if you don't see the debug prints you inserted, it may mean that your code is not used. You could prepare PR it will be easier to understand the issues that you have.

tianyiSKY1 commented 1 month ago

Hi, @mvafin I've solved the problem of the code not being used. After successfully running the code I realized that I can't expand the dimension of the input with v1::Reshape

auto reshape_pattern = context.mark_node(v0::Constant::create(element::i64, Shape{5}, {-1, 0, 0, 0, 0}));
input = context.mark_node(std::make_shared<v1::Reshape>(input, reshape_pattern, true));

If I expand the dimension in this way, will report an error

E       RuntimeError: Check 'i < input_shape.size()' failed at src/core/shape_inference/include/reshape_shape_inference.hpp:306:
E       While validating node 'opset1::Reshape aten::avg_pool3d/Reshape (opset1::Parameter input_tensor[0]:f32[?,?,?,?], opset1::Constant aten::avg_pool3d/Constant[0]:i64[5]) -> (dynamic[?,?,?,?,?])' with friendly_name 'aten::avg_pool3d/Reshape':
E       Shape inference input shapes {[?,?,?,?],[5]}
E       '0' dimension is out of range

I tried to successfully expand the dimensions with v0::Unsqueeze and it passed the test for inputs without batch dimensions.

auto unsqueeze_axis = context.mark_node(v0::Constant::create(element::i64, Shape{}, {0}));
input = context.mark_node(std::make_shared<v0::Unsqueeze>(input, unsqueeze_axis));

However, due to the failure to obtain dynamic shapes, it would expand inputs that have batch dimensions as well. So I want to get guidance to know whether I’m using v1::Reshape incorrectly, or if there’s a way to ascertain the length of a dynamic shape.

mvafin commented 1 month ago

@tianyiSKY1 It is my bad, I suggested using Reshape with zeros, but this is will not work in this case. Using Unsqueeze is good idea if you can detect existence of batch dimension during conversion, but in case you can not do that, I would recommend using Reshape and calculating the required shape using ShapeOf, Slice the shape with negative start, stop and use ScatterElementUpdate to get the required shape by inserting output of slice to the end of tensor filled with -1, so it will create the output shape [-1,C,H,W...]. That is how to get input shape. Output shape can be calculated similarly by using Slice on the input shape with start=0 and end -N which will output batch dimension or empty tensor if it doesn't exist. After that use Concat to concatenate this batch tensor with the rest of the output shape with removed batch, that you can again get using Slice.

openvinotoolkit / openvino