matsengrp / torchdms

Analyze deep mutational scanning data with PyTorch
https://matsengrp.github.io/torchdms/
2 stars 0 forks source link

soften monotonicity constraint #70

Open wsdewitt opened 4 years ago

wsdewitt commented 4 years ago

In 1D GE models, a monotonic I-spline is used to map from the latent space to the output space. In a GGE model, the moral equivalent of this is to stipulate each output dimension in y is monotonic in its corresponding latent space dimension in z.

The current monotonicity implementation projects all the weights in g(z) to the non-negative orthant after each gradient step*, which is sufficient for monotonicity, but not necessary, and probably a stronger condition than we want. The downside is that it cannot accommodate tradeoffs: every directional derivative is positive, so there are no directions in feature space that increase binding at the expense of folding (or vice versa).

A weaker condition is to stipulate that the diagonal elements of the Jacobian are non-negative: ∂y_i/∂z_i ≥ 0. This keeps the biophysical interpretation of the latent space dimensions intact, but allows phenotype trade-offs.

It's not immediately obvious to me how to implement this; it's not a simple box constraint. Maybe it could be done as a soft penalty: relu(-∂y_1/∂z_1) + relu(-∂y_2/∂z_2).

*As a side issue, Erick noticed that the projection happens before the gradient step, which is probably a bug: https://github.com/matsengrp/torchdms/blob/e0bdb4ccd03fed7b73e7b501e20ce8046555b5ad/torchdms/analysis.py#L153-L154

matsen commented 4 years ago

My brain's full right now, but I note that we do have access to gradients during training, so we could muck with them before doing a gradient step.

wsdewitt commented 4 years ago

It might be easy to implement this in an architecture where there are distinct g_bind and g_fold networks, as proposed in #53. The intra-g weights could be clamped >= 0, but the sparse inter-g weights could be unconstrained.

Another thing to note is that, even for 1D monotonicity, clamping to positive weights is sufficient but not necessary, and limits expressiveness even within the space of monotonic functions. This paper proposes modeling the derivative of a monotonic function with a non-negative neural network, and using quadrature to evaluate the monotonic function (defined as the network's antiderivative). This seems quite analogous to how monotonic I-splines are defined as integrals in a non-negative M-spline basis.

image

(As a cute technical detail, you get to use Fenyman's trick for backprop)

matsen commented 4 years ago

Yes, you're totally right. That paper looked cool, but more than we need. Remember that we're getting quite nice performance in one dimension with 25 monotonic hardtanh's!

But I like the idea of combining this with #53 the best.

wsdewitt commented 4 years ago

Another approach for monotonicity is to simply penalize violations of it, as in nearly isotonic regression. From Hastie, Tibs, and Wainwright:

image image

Note: the underscore "+" notation is the positive part (essentially a relu)