@rneher, this is more of a discussion point for us to think about it, and I figured I'd put it here as an issue. No need to add anything, but if you have thoughts this could be a centralized place to keep them.
Like every pipeline, this one would probably benefit from being re-written from scratch as it's developed some vestigial parts and non-intuitive structure since it grew organically.
Whether it's worth doing that probably depends on whether we actually plan on re-running it regularly or will basically just stick with current results.
Whether we will run it more probably depends in part on rate of SARS-CoV-2 sequencing in future: if there keep being millions of new sequences per year then we probably want to keep using them by re-running, if sequencing slows 10- or 100-fold then may not be worth it.
Also, current approach to estimating clade-specific synonymous (four-fold degenerate) mutation rates and only using clades with enough sequences to make such estimates may stop working well if new clades are less sequenced.
@rneher, this is more of a discussion point for us to think about it, and I figured I'd put it here as an issue. No need to add anything, but if you have thoughts this could be a centralized place to keep them.