Neural Arithmetic Logic Units

TL;DR

Present a simple module capable of learning arithmetic functions such as add, sub, mult, div, etc. And can generalize well on unseen data or unseen inference scheme.

DNNs with Non-linearities Struggle to Learn Identity Function

Train an autoencoder to reconstruct its input ranged [-5, 5].
All autoencoders are identical in its parameterization (3 hidden layers of size 8), only using different non linearities.
Trained with MSE loss.
Tested in [-20, 20], the error increase severely both below and above the range of numbers seen during training.

The Neural Accumulator (NAC) & Neural Arithmetic Logit Unit (NALU)

NAC: A special case of linear layer, whose weight matrix W only consists of {-1, 0, 1}, defined as:
- W = tanh(\hat{W}) * σ(\hat{M})
- The elements of W are guaranteed to be [-1, 1], and biased towards {-1, 0, 1} during learning, since {-1, 0, 1} corresponds to the saturation points of either tanh(.) or σ(.)
- Its output are additions or subtractions of rows in the input vector.
NALU: Learns a weighted sum between two sub-cells:
- One is the original NAC, capable of learning to add and subtract.
- The other one operates in log space, capable of multiply and divid, e.g., log(XY) = logX + logY; log(X/Y) = logX - log Y; exp(log(X)) = X
- Altogether, NALU can learn to perform general arithmetic operations.

Limitations of a single NALU [Ref]

Can handle either add/subtract or mult/div operations but not a combination of both.
For mult/div operations, it cannot handle negative targets as the mult/div gate output is the result of an exponentiation operation which always yeilds positive results.
Power operations are only possible when the exponent is in the range of [0, 1].

Related Work

Analysing Mathematical Reasoning Abilities of Neural Models. by David Saxton et al. DeepMind. ICLR 2019.

howardyclo / papernotes

Neural Arithmetic Logic Units #52

Metadata

TL;DR

DNNs with Non-linearities Struggle to Learn Identity Function

The Neural Accumulator (NAC) & Neural Arithmetic Logit Unit (NALU)

Limitations of a single NALU [Ref]

Related Work