A capsule network proposed by Geoffrey Hinton, using layer-wise parallel attention
Insight from attention in human vision where irrelevant details are ignored via sequence of fixation points
Activities of the neuron in an active capsule represent the various properties of a particular entity that is present in the image (position, thickness, size, orientation, deformation etc)
Details
Routing Algorithm
Existing CNN simply max-pools the single scalar from matrix of numbers to extract the most impressive traits
Capsules pools information from previous layer's capsules via dynamic routing algorithm
Routing Softmax determines the initial from L to L+1 connectivity -> input in L+1 is calculated via weighted sum -> input in L+1 is squashed to 0~1 range -> routing logit from L to L+1 is updated by L's prediction and squash (~attention mechanism)
Architecture
Simple 3-layer CapsNet with routing connection between Primary caps and DigitCaps only
Result on MNIST
better than CNN
What CapsNet learns
each dimension in DigitCaps do learn some properties
Abstract
Details
Routing Algorithm
Architecture
Result on MNIST
What CapsNet learns
MultiMNIST
Personal Thoughts
Link : https://arxiv.org/pdf/1710.09829.pdf Authors : Sabour et al. 2017