Closed zorian15 closed 1 year ago
Genealogical pi (mentioned in issue #16) will also be helpful with this as well, as we should have a better idea how closely our simulated trees resemble actual flu trees
Is GeometricSeqPhenotype
HAS-A GeometricPhenotype
another way to think about this? If so, I think composition would work better than inheritance.
thanks @thienktran -- this is a good point/question! My rustiness with OOP is showing here. But after doing some googling to brush up on inheritance vs compositions -- yes I'd say you're mostly right. We could think of GeometricSeqPhenotype
as a HAS-A GeometricPhenotype
and SequencePhenotype
-- but with some differing properties. The more and more I think about this, I kind of think it would be better to create a novel class that just implements Phenotype
, since we seem likely to move away from the logic of GeometricPhenotype.mutate()
and maybe even SequencePhenotype.mutate()
. What do you think about this implementation? If we went forward with a class that was composed of GeometricPhenotype
and SequencePhenotype
, we'd still need to override methods like mutate()
and possibly riskOfInfection()
? But I also will add things we'll possibly need for an implementation to this issue later today!
Okay here are some updated ideas for a GeoSeqPhenotype
type class. To start, I'll share some ideas for what I'm calling two antigenic landscapes to simulate and outline for implementation in antigen
.
This was the first idea shared by @Haddox where we keep the current behavior of GeometricPhenotype
but just give each virus a sequence.
Big Idea: Essentially glue GeometricPhenotype
and SequencePhenotype
together
GeoSeqPhenotype
(or some other name), that is a composition of a GeometricPhenotype
and SequencePhenotype
class.geometricPheno
and seqPheno
.mutate
would create a new GeoSeqPhenotype
object initialized with the phenotypes created after calling mutate()
for the two phenotypes respectively.riskOfInfection
and distance
methods in GeometricPhenotype
toString()
spit out a concatenation of both the euclidean coordinates and sequence.@Haddox brought up the pitfall of the random virus landscape in that if reversion happen along the tree, we could have viruses with the same sequence but be in different locations in our geometric antigenic space. The proposed solution here was to pre-define "moves" in antigenic space for each possible mutation ahead of time. This way, if we start with mutation A5T, and then in the next step, obtain mutation T5A, we end up in the same position in Euclidean space.
Bid Idea: Add a few constraints to the above landscape by pre-defining mutational effects. Each possible mutation has a random mutational effect, but these effects are randomly generated before the simulation and held constant throughout. Reversions should be a 180 degree reflection the corresponding mutation.
This class may require some additional objects for creating and storing moves for each mutation.
mutationTable
- a 2D array of mutations and the corresponding radius and $\theta$ for a step made by mutation $m$.
mutationTableList
- store a mutationTable
for each sitecodonVariantTable
-- store genetic code (codons --> amino acids) in a dictionary
antigen
with DMS data or predictions. One thing to note here is how we'll handle mutations that lead to stop codons.As far as methods go, I think mutate()
is the only one that would have to be re-defined from scratch.
mutate
- call SequencePhenotype.mutate()
to draw a new mutation. Find the appropriate mutation vector for the generated sequence and use $r$ and $\theta$ to move the antigenic phenotype. Return the new object (with the updated traits and sequence).riskOfInfection
- for now, we could just roll with calling the GeometricPhenotype
methods here, but if we ever want to incorporate hamming distance or substitution counts in epitope and non-epitope sites into this, we'll re-visit how to formulate this into an infection risk modeldistance
- for now, use geometric distance. Maybe down the line, we can define a method that calculates hamming distances between sequences for above.This was a lot of stuff -- so I'll pause here. These are just some initial implementation ideas for the base case models -- any thoughts/questions @thienktran @Haddox @matsen ? I feel like starting with implementing the random model would be a nice start, and then working on the psuedo-random model afterwards.
I only have a minor technical comment.
The "pseudo-random" landscape (and I think we can come up with a name for that that doesn't conflict with the standard def of pseudorandom) doesn't actually require us to generate all of the mutations in advance. Instead, we can define an object that can return mutation effects "by request". If one is sampled that is new, we sample it then store it in a map. If we want that mutation, or its opposite, we return it from the map.
Thanks for writing this, Zorian! I started coding up some of these ideas on 17-geometric-sequence
after chatting with Hugh yesterday. I didn't push the updates, since I couldn't get antigenicDiversity to be anything but 0. Your implementation suggestions will definitely help me fix some bugs.
@zorian15 Composition is really slow, so I decided to try out inheritance instead. It works super well - thanks for originally suggesting it!! It didn't work when I first tried it since I didn't overwrite the getter methods.(I'm not sure why I had to do this for it to work since I didn't make any changes to it. Just glad it's fast and looks good 🚀).
@thienktran yay!! That's awesome to hear! Great work and thanks a ton for pushing this implementation forward. Feel free to let me know whenever you're ready for code review 😁
@Haddox had the idea that an easy way to ensure consistent trees from
antigen
while also storing sequence information would be to just add a sequence filed toGeometricPhenotype
to start.Down the line, we can use this new hybrid class to allow DMS data or information about epitope sites to influence how mutations change a virus's location in antigenic space. But for now, I suggest we make the two following changes (or some variation of them) to
antigen
:GeometricPhenotype
that is able to hold sequences as a private field -- I call itGeometricSeqPhenotype
😄GeometricPhenoype
but mutate a site in the sequence field as we do inSequencePhenotype
Of course, we can extend upon this in the future, but this should be a good starting point to just simulate some trees, w/ sequences.