nextstrain / forecasts-ncov

SARS-CoV-2 variant growth rates and frequency forecasts
https://nextstrain.org/sars-cov-2/forecasts/
7 stars 2 forks source link

Simplify pango collapsing by using pango_aliasor package #39

Closed corneliusroemer closed 1 year ago

corneliusroemer commented 1 year ago

Followup of #35

I've previously written the pango_aliasor package to simplify collapsing and aliasing of Pango lineage names

Using that package (available through pip and bioconda) removes a lot of code No more reliance on a Nextclade tree to get aliasing info Instead aliasing info comes from the root source, the pango-designation repo

The only downside is that we don't currently install pango_aliasor in docker-base But there's no real reason why we shouldn't add it there It's a small python package on the order of kB with no dependencies other than Python>3.7

Blocking:

corneliusroemer commented 1 year ago

Thanks @joverlee521 - I think you're right! There may not be a need to uncompress at all after all. Though one needs to be careful to still create a parent of "other" when there is no parent (aliasor.parent("XBC") would currently yield '')

corneliusroemer commented 1 year ago

@joverlee521 I'm now using aliasor.parent, it works indeed, and now there is even less code πŸ‘

Note that the collapsing logic may have a logic error in that it collapses parents before all the children to be collapsed have been collapsed. We should collapse from children inwards, only collapse a parent when all its children are either collapsed or big enough.

I can easily fix this, but it’s a bit late here, so maybe tomorrow)

corneliusroemer commented 1 year ago

Are you happy with it now @joverlee521? I've re-requested review

trvrb commented 1 year ago

Thanks so much for PRs here @corneliusroemer 🌟. This is a big improvement both in terms of code overhead (by using pango_aliasor) and in terms of algorithm (to work upwards from tips). I just made some simple changes in the above commit to reduce verbosity of the standard out logging. Otherwise it looks great to me.

And thank you for adding tests @joverlee521 πŸ™.

corneliusroemer commented 1 year ago

Now just blocked by https://github.com/nextstrain/forecasts-ncov/pull/39

joverlee521 commented 1 year ago

Waiting on the latest nextstrain/base image to be pushed to merge this.