roblanf / sarscov2phylo

Global phylogenies of SARS-CoV-2 sequences
GNU General Public License v3.0
86 stars 22 forks source link

switch to doing all alignments directly to WH1 #9

Closed roblanf closed 4 years ago

roblanf commented 4 years ago

my currently k-most-dissimilar approach is overengineered, which wouldn't be an issue except that it is going to be very useful to maintain coordinates w.r.t. WH1.

Note that this will also simplify (a lot) the alignment masking, since the algorithm can now look something like:

  1. Clean up GISAID data
  2. Align everything to WH1 (this gives an un-masked alignment)
  3. Mask (don't trim) sites (this means coordinates are all maintained)
  4. Estimate tree

This will save a lot of time compared to what I'm currently doing, and increase utility by maintaining alignment coordinates, and maintaining a completely un-masked copy of the alignment.

roblanf commented 4 years ago

done