matsengrp / antigen

Simulating virus evolution and epidemiology
http://bedford.io/projects/antigen/
1 stars 0 forks source link

Combining `GeometricPhenotype` with `SequencePhenotype` #17

Closed zorian15 closed 1 year ago

zorian15 commented 2 years ago

@Haddox had the idea that an easy way to ensure consistent trees from antigen while also storing sequence information would be to just add a sequence filed to GeometricPhenotype to start.

Down the line, we can use this new hybrid class to allow DMS data or information about epitope sites to influence how mutations change a virus's location in antigenic space. But for now, I suggest we make the two following changes (or some variation of them) to antigen:

Of course, we can extend upon this in the future, but this should be a good starting point to just simulate some trees, w/ sequences.

zorian15 commented 2 years ago

Genealogical pi (mentioned in issue #16) will also be helpful with this as well, as we should have a better idea how closely our simulated trees resemble actual flu trees

thienktran commented 2 years ago

Is GeometricSeqPhenotype HAS-A GeometricPhenotype another way to think about this? If so, I think composition would work better than inheritance.

zorian15 commented 2 years ago

thanks @thienktran -- this is a good point/question! My rustiness with OOP is showing here. But after doing some googling to brush up on inheritance vs compositions -- yes I'd say you're mostly right. We could think of GeometricSeqPhenotype as a HAS-A GeometricPhenotype and SequencePhenotype -- but with some differing properties. The more and more I think about this, I kind of think it would be better to create a novel class that just implements Phenotype, since we seem likely to move away from the logic of GeometricPhenotype.mutate() and maybe even SequencePhenotype.mutate(). What do you think about this implementation? If we went forward with a class that was composed of GeometricPhenotype and SequencePhenotype, we'd still need to override methods like mutate() and possibly riskOfInfection()? But I also will add things we'll possibly need for an implementation to this issue later today!

zorian15 commented 2 years ago

Proposed antigenic landscapes

Okay here are some updated ideas for a GeoSeqPhenotype type class. To start, I'll share some ideas for what I'm calling two antigenic landscapes to simulate and outline for implementation in antigen.

The random landscape

This was the first idea shared by @Haddox where we keep the current behavior of GeometricPhenotype but just give each virus a sequence.

Implementation details

Big Idea: Essentially glue GeometricPhenotype and SequencePhenotype together

The pseudo-random landscape

@Haddox brought up the pitfall of the random virus landscape in that if reversion happen along the tree, we could have viruses with the same sequence but be in different locations in our geometric antigenic space. The proposed solution here was to pre-define "moves" in antigenic space for each possible mutation ahead of time. This way, if we start with mutation A5T, and then in the next step, obtain mutation T5A, we end up in the same position in Euclidean space.

Implementation details

Bid Idea: Add a few constraints to the above landscape by pre-defining mutational effects. Each possible mutation has a random mutational effect, but these effects are randomly generated before the simulation and held constant throughout. Reversions should be a 180 degree reflection the corresponding mutation.

This class may require some additional objects for creating and storing moves for each mutation.

As far as methods go, I think mutate() is the only one that would have to be re-defined from scratch.

This was a lot of stuff -- so I'll pause here. These are just some initial implementation ideas for the base case models -- any thoughts/questions @thienktran @Haddox @matsen ? I feel like starting with implementing the random model would be a nice start, and then working on the psuedo-random model afterwards.

matsen commented 2 years ago

I only have a minor technical comment.

The "pseudo-random" landscape (and I think we can come up with a name for that that doesn't conflict with the standard def of pseudorandom) doesn't actually require us to generate all of the mutations in advance. Instead, we can define an object that can return mutation effects "by request". If one is sampled that is new, we sample it then store it in a map. If we want that mutation, or its opposite, we return it from the map.

thienktran commented 2 years ago

Thanks for writing this, Zorian! I started coding up some of these ideas on 17-geometric-sequence after chatting with Hugh yesterday. I didn't push the updates, since I couldn't get antigenicDiversity to be anything but 0. Your implementation suggestions will definitely help me fix some bugs.

thienktran commented 2 years ago

@zorian15 Composition is really slow, so I decided to try out inheritance instead. It works super well - thanks for originally suggesting it!! It didn't work when I first tried it since I didn't overwrite the getter methods.(I'm not sure why I had to do this for it to work since I didn't make any changes to it. Just glad it's fast and looks good 🚀).

zorian15 commented 2 years ago

@thienktran yay!! That's awesome to hear! Great work and thanks a ton for pushing this implementation forward. Feel free to let me know whenever you're ready for code review 😁