openvinotoolkit / training_extensions

Train, Evaluate, Optimize, Deploy Computer Vision Models via OpenVINO™
https://openvinotoolkit.github.io/training_extensions/
Apache License 2.0
1.14k stars 442 forks source link

Feature Request: Support custom and non-square input sizes #3581

Closed j99ca closed 1 month ago

j99ca commented 3 months ago

According to the docs, the input sizes supported by OTX is a set list of square input sizes. With most convolutional model architectures, it should be possible to use non-square input sizes while maintaining the use of pre-trained weights, through a global pooling layer at the head of the model. This is possible with some classification models in TensorHub, and it would be a great feature for OTX classification and would accelerate our adoption of this library at the edge. I have use cases for very tall images from certain sensors where resizing them to any of the set list of square sizes skews the aspect ratio and can destroy the features needed for classification.

goodsong81 commented 3 months ago

@eunwoosh Let's consider non-square input size.

j99ca commented 3 months ago

@goodsong81 @eunwoosh do you folks have a timeline for custom inputs (with or without non-square inputs) in this library? I am trying to schedule some integration into OTX 2.x and the lack of this feature is blocking.

Keep up the good work!

goodsong81 commented 3 months ago

@goodsong81 @eunwoosh do you folks have a timeline for custom inputs (with or without non-square inputs) in this library? I am trying to schedule some integration into OTX 2.x and the lack of this feature is blocking.

Keep up the good work!

Not yet confirmed but I suppose it will be enabled in the next quarter (Q3) of this year.

j99ca commented 2 months ago

@goodsong81 I see that this PR got merged: https://github.com/openvinotoolkit/training_extensions/pull/3759

Could that input_size parameter be used instead of fixed values in the model scripts? E.g. in MobileNetV3Base:

class MobileNetV3Base(ModelInterface):
    """Base model of MobileNetV3."""

    def __init__(
        self,
        num_classes: int = 1000,
        width_mult: float = 1.0,
        in_channels: int = 3,
        input_size: tuple[int, int] = (224, 224),
        dropout_cls: nn.Module | None = None,
        pooling_type: str = "avg",
        feature_dim: int = 1280,
        instance_norm_first: bool = False,
        self_challenging_cfg: bool = False,
        **kwargs,
    ):

as well as associated export code? E.g. MobileNetV3ForMulticlassCls

    @property
    def _exporter(self) -> OTXModelExporter:
        """Creates OTXModelExporter object that can export the model."""
        return OTXNativeModelExporter(
            task_level_export_parameters=self._export_parameters,
            input_size=(1, 3, 224, 224),
            mean=(123.675, 116.28, 103.53),
            std=(58.395, 57.12, 57.375),
            resize_mode="standard",
            pad_value=0,
            swap_rgb=False,
            via_onnx=False,
            onnx_export_configuration=None,
            output_names=["logits", "feature_vector", "saliency_map"] if self.explain_mode else None,
        )
eunwoosh commented 2 months ago

Hi @j99ca , #3759 is preparation step for configurable input size. That PR just enables transforms in recipe to use $(input_size). I'm now implementing configurable input size using #3759. Currently, there is no input size configuration interface which updates both model and dataset, so if you want to do that, it's needed to change model class code which includes init argument or exporter part as you said.

eunwoosh commented 1 month ago

3788 is merged. OTX supports non-square input size now.