Consider ways to improve the numerical stability of DifferentiableFunctions like Reciprocal

As a machine learning engineer, I want my neural network training to be numerically stable so that I'm not surprised with errors and other unexpected behavior at prediction time.

A cursory look at autograd suggests that the automatic differentiation mechanism is not responsible for numerical stability. I'm not even sure TensorFlow does anything to guarantee numerical stability. It might just be the user's responsibility when they're writing custom layers. In practice, this isn't too big of a problem because the only layer that uses Reciprocal is Sigmoid, which has no discontinuities.

kostaleonard / great-model-theory

Consider ways to improve the numerical stability of DifferentiableFunctions like Reciprocal #76