[CVPR 2015] Fully Convolutional Networks for Semantic Segmentation

uhhyunjoo commented 2 years ago

link
paper	Fully Convolutional Networks for Semantic Segmentation
code	papers with code

uhhyunjoo commented 2 years ago

Abstract

Convolutional networks : hierarchies of features 를 생성하는 powerful visual models

본 논문 : convolutional networks 를 end-to-end, pixels-to-pixel 로 학습한 것이 semantic segmentation 에서 sota 를 달성하는 것을 보여줌
key insight : 임의의 size 의 input 을 취하는, "fully convolutional" 한 networks 을 만듦 -> efficient 한 inference 와 learning 을 통해 correspondingly-sized output 을 생성한다.
fully convolutional networks (이하 FCN) 의 space 를 define 하고, detail 했다. (상세하게 설명했다)
또한 spatially dense prediction tesks 에 FCN 이 적용될 수 있음을 설명하고, 이전 모델들에 대한 connection 을 이끌었다.
contemporay classification networks (AlexNet, VGG net, GoogLeNet) 을 FCN 으로 adapt 했고, 그들의 학습된 representations 를 fine-tuning 함으로써 segmentation task 에 transfer 시켰다.
이후, 정확하고 자세한 segmentations 를 생성하기 위해, a deep, coarse layer 으로부터 나오는 semantic information 과, a shallow, fine layer 로 부터 나오는 appearance information 를 combine 시키는, skip architecture 를 정의하였다.
FCN 은 PASCAL VOC, NYUDv2, SIFT Flow 등에서 sota를 달성하였다, 이때 inference 는 a typical image 에 대해 0.2 초 보다 더 적게 걸렸다.

uhhyunjoo commented 2 years ago

Introduction

ConvNet 의 사용처

whole-image classification
local taksks : structured output 을 만들어 내는 것
- Bounding box object detection
- Part and key-point prediction
- local correspondence
- 이런 tasks 들이 있구나~
coarse inference 에서 fine inference 로 가는 일반적인 progression 은, every pixel level 에서 prediction 을 만드는 것이다
이전 접근 방식은, semeantic segmentation 을 위해 convnets 를 사용해왔다. 이는, 각 pixel 이 이를 enclosing 하는 object 나 region 의 class 로 라벨되어 있었다. 그렇지만 단점이 있었고, 그 단점은 본 논문이 해결한다.
본 논문에서는 semantic segmentation 에 대해 end-to-end, pixels-to-pixels 로 학습시킨 FCN 이 추가적인 machinery 없이도, sota 를 달성함을 보여준다.
본 연구는 FCNs 를 end-to-end 방식으로 (1) pixelwise prediction 을 위해, (2) supervised pre-training 을 이용하여 학습시킨 첫 연구이다.
현존하는 네트워크들의 Fully convolutional versions 은 arbitary-sized inputs 을 사용하여 dense outputs 를 predict 한다.
Learning 과 Inference 모두, whole-image-at-a-time 으로 수행된다. dense feedforward computation 과 backpropagation 에 의해서.
In-network upsampling layers 는 pixelwise prediction 과 learning in nets with subsampled pooling 을 가능하게 한다.
이 방법론은 asymptotically and absolutely 하게 efficient 하고, 다른 작업에서의 복잡성을 배제시켜준다.
Patchwise training 은 흔한 방법이긴한데, fully convolutional training 의 효과가 부족하다.

uhhyunjoo / paper-notes

[CVPR 2015] Fully Convolutional Networks for Semantic Segmentation #12

Abstract

Introduction