Dynamic Routing Between Capsules

Abstract

A capsule network proposed by Geoffrey Hinton, using layer-wise parallel attention
Insight from attention in human vision where irrelevant details are ignored via sequence of fixation points
Activities of the neuron in an active capsule represent the various properties of a particular entity that is present in the image (position, thickness, size, orientation, deformation etc)

Routing Algorithm
- Existing CNN simply max-pools the single scalar from matrix of numbers to extract the most impressive traits
- Capsules pools information from previous layer's capsules via dynamic routing algorithm
- Routing Softmax determines the initial from L to L+1 connectivity -> input in L+1 is calculated via weighted sum -> input in L+1 is squashed to 0~1 range -> routing logit from L to L+1 is updated by L's prediction and squash (~attention mechanism)
Architecture
- Simple 3-layer CapsNet with routing connection between Primary caps and DigitCaps only
Result on MNIST
- better than CNN
What CapsNet learns
- each dimension in DigitCaps do learn some properties
MultiMNIST
- learning overlapping digits
- equivalent to SOTA ~ 5% error rate on tes set