guanfuchen / semseg

常用的语义分割架构结构综述以及代码复现 华为媒体研究院 图文Caption、OCR识别、图视文多模态理解与生成相关方向工作或实习欢迎咨询 15757172165 https://guanfuchen.github.io/media/hw_zhaopin_20220724_tiny.jpg
761 stars 164 forks source link

Scale-adaptive Convolutions for Scene Parsing #45

Open guanfuchen opened 5 years ago

guanfuchen commented 5 years ago

related paper

摘要
Many existing scene parsing methods adopt Convolutional Neural Networks with fixed-size receptive fields, which frequently result in inconsistent predictions of large objects and invisibility of small objects. To tackle this issue, we propose a scale-adaptive convolution to acquire flexible-size receptive fields during scene parsing. Through adding a new scale regression layer, we can dynamically infer the position-adaptive scale coefficients which are adopted to resize the convolutional patches. Consequently, the receptive fields can be adjusted automatically according to the various sizes of the objects in scene images. Thus, the problems of invisible small objects and inconsistent large-object predictions can be alleviated. Furthermore, our proposed scale-adaptive convolutions are not only differentiable to learn the convolutional parameters and scale coefficients in an end-to-end way, but also of high parallelizability for the convenience of GPU implementation. Additionally, since the new scale regression layers are learned implicitly, any extra training supervision of object sizes is unnecessary. Extensive experiments on Cityscapes and ADE20K datasets well demonstrate the effectiveness of the proposed scale-adaptive convolutions.

image


概述

许多已有的场景解析方法都采用了固定大小感受野的CNN,导致了大目标分割的不一致性和小目标不可见的问题。本文提出了尺度自适应卷积来获取灵活大小的感受野。