kgori / sigfit

Flexible Bayesian inference of mutational signatures
GNU General Public License v3.0
33 stars 8 forks source link

Remove integer counts restriction #33

Closed baezortega closed 5 years ago

baezortega commented 6 years ago

At least in the NMF models, the type of the counts matrix should be allowed to take float values, in order to allow fitting general data types. It might be better/necessary to leave counts as integer for the Poisson models, though.

baezortega commented 6 years ago

Temporary fix (in dev branch, already tested): I have added a new argument to to_matrix so that the counts matrix gets automatically rounded if values are non-integer (with a warning). That will allow using the models for real matrices, assuming that the rounding error is acceptable.

--

I didn't get to implement the analogous multi-t model for continuous fitting/extraction. There seem to be two ways:

The latter seems to be the only feasible way to implement this (as in practice HMC often can't explore the covariance matrix posterior), but it requires doing manual updating of the target. I've found an analogous implementation of the multi-normal here, and an example of how to do the same for the multi-t (assuming, however, that one already knows the Cholesky factor) here.

In theory, these two examples could be combined into a working multi-t with the Cholesky parameterisation – another question is whether the dimensionality would still be low enough as to allow sampling. If we run into trouble, we could limit to C>T changes to start with. However, another question we need to consider is whether it is really convenient to treat "outliers" as outliers (i.e. dismiss them) in our models, given how important individual components can be to distinguish between signatures. I think it would be interesting to try it out, if we ever find the time to implement it.