harrispopgen / mushi

[mu]tation [s]pectrum [h]istory [i]nference
https://harrispopgen.github.io/mushi/
MIT License
24 stars 5 forks source link

replace total variation / fused lasso with trend filtering #57

Closed wsdewitt closed 3 years ago

wsdewitt commented 4 years ago

Trend filtering may be a more flexible and general way to regularize on finite differences (smoothness in time).

The current implementation does total variation (TV) / fused lasso regularization, which is a 0-th order trend filter (1st order finite difference operator). We tack on a spline term to get more smoothness.

A better approach is probably to replace these two terms with a single trend filter term (i.e. L1 penalty on second-order finite differences). Trend filters are provable better than smoothing splines at adapting to local smoothness: https://projecteuclid.org/euclid.aos/1395234979

The hard part is that we don't have a proximal operator for general trend filters, as we do for the TV penalty with the prox-tv python package. Existing trend filter code all assumes least-squares loss.

Another apparently similar method known as total generalized variation (TGV) in the signal processing literature: https://epubs.siam.org/doi/abs/10.1137/090769521?casa_token=DNBx457lt_AAAAAA:q1B4-Tt6c5zkmR5mfgN0xJDqfK-0K9KVzsdNzIVC44tlS3XSX3zmi2w6SGNdY3b9tXF6ODIvSsTb

kamdh commented 4 years ago

I dunno, I'd leave this out for now, unless you really really want it. Trend filtering is l1 penalty on second differences, leading it to prefer piecewise linear trajectories rather than piecewise constant. If you think that's a better prior, then maybe so. But it does become a zoo of various penalties to work with, so you can go down the rabbit hole of different penalty functions for a long time. I like the ones we've used here for our first work.

I think there's got to be some way of computing the prox of the trend filtering penalty fairly efficiently.

wsdewitt commented 4 years ago

Yeah this is marked as an enhancement, not critically needed. I think it’ll behave better with histories that have varying smoothness, and require one less param (well another param is the order of finite diff, but just an integer)