This PR adds basic support for loading trees from BEAST.
In addition, in order to make transition matrix and ambiguity code options more explicit, and to reduce overhead in functions which require those options, this PR introduces some new classes in parsimony.py:
AmbiguityMap, which contains an immutable mapping from ambiguity codes to sets of bases, a reversed version of that map, and a set of bases. (the new class ReversedAmbiguityMap holds the reversed map, but it's a barely-modified subclass of frozendict)
TransitionAlphabet holds a tuple of bases, a transition weight matrix, and an ambiguity map, as well as a variety of useful derived data structures and methods. In particular, methods to produce edge weight functions that implement weighted parsimony according to the options encapsulated in a particular TransitionAlphabet instance
UnitTransitionAlphabet is a subclass for unit transition costs, which allow some optimizations for computing hamming distance
VariableTransitionAlphabet is a subclass that generalizes AmbiguityMap to allow different transition costs for each site. This subclass enforces a fixed sequence length for considered sequences.
I'm very bad at naming, so let me know if you have suggestions @marybarker. In particular, I was using transition_model lots for variables that expect TransitionAlphabet, so I should probably change the class name to match... but is TransitionModel the best we can come up with?
This is the squashed version of #60
(Copied from #60):
This PR adds basic support for loading trees from BEAST.
In addition, in order to make transition matrix and ambiguity code options more explicit, and to reduce overhead in functions which require those options, this PR introduces some new classes in
parsimony.py
:AmbiguityMap
, which contains an immutable mapping from ambiguity codes to sets of bases, a reversed version of that map, and a set of bases. (the new classReversedAmbiguityMap
holds the reversed map, but it's a barely-modified subclass of frozendict)TransitionAlphabet
holds a tuple of bases, a transition weight matrix, and an ambiguity map, as well as a variety of useful derived data structures and methods. In particular, methods to produce edge weight functions that implement weighted parsimony according to the options encapsulated in a particular TransitionAlphabet instanceUnitTransitionAlphabet
is a subclass for unit transition costs, which allow some optimizations for computing hamming distanceVariableTransitionAlphabet
is a subclass that generalizesAmbiguityMap
to allow different transition costs for each site. This subclass enforces a fixed sequence length for considered sequences.I'm very bad at naming, so let me know if you have suggestions @marybarker. In particular, I was using
transition_model
lots for variables that expectTransitionAlphabet
, so I should probably change the class name to match... but isTransitionModel
the best we can come up with?