For larger collations with increasingly varied substitution matrices at different sites/variation units, it becomes increasingly likely that at least one of the substitution matrices will be singular under the initial assignments of transcriptional rate parameters (which presently all default to 2.0). This ceases to be a problem after BEAST 2's first evaluation of the site likelihoods, because in subsequent likelihood calculations, it samples these rate parameter values randomly according to a prior distribution, and the resulting substitution matrices will almost never be singular. Since BEAST 2 XML inputs require that every RealParameter element have an initial @value attribute specified, we need to supply these values, but we should be able to avoid singular matrices if we set these values randomly in to_beast.
This can be done by using distributions available in numpy, which is already a dependency of teiphy. I suggest replacing the current offset log-normal prior distribution assumed for transcriptional rate parameters with a simple gamma distribution (perhaps with alpha="5" and beta="2" as default values). Then we can sample random initial values for the rate parameters from this distribution using something like
For larger collations with increasingly varied substitution matrices at different sites/variation units, it becomes increasingly likely that at least one of the substitution matrices will be singular under the initial assignments of transcriptional rate parameters (which presently all default to 2.0). This ceases to be a problem after BEAST 2's first evaluation of the site likelihoods, because in subsequent likelihood calculations, it samples these rate parameter values randomly according to a prior distribution, and the resulting substitution matrices will almost never be singular. Since BEAST 2 XML inputs require that every
RealParameter
element have an initial@value
attribute specified, we need to supply these values, but we should be able to avoid singular matrices if we set these values randomly into_beast
.This can be done by using distributions available in
numpy
, which is already a dependency ofteiphy
. I suggest replacing the current offset log-normal prior distribution assumed for transcriptional rate parameters with a simple gamma distribution (perhaps withalpha="5"
andbeta="2"
as default values). Then we can sample random initial values for the rate parameters from this distribution using something likeTo ensure replicability for generated outputs, we will need to add support for a
--seed
command-line argument.