Closed corneliusroemer closed 1 year ago
Thanks @joverlee521 - I think you're right! There may not be a need to uncompress at all after all. Though one needs to be careful to still create a parent of "other" when there is no parent (aliasor.parent("XBC")
would currently yield ''
)
@joverlee521 I'm now using aliasor.parent
, it works indeed, and now there is even less code π
Note that the collapsing logic may have a logic error in that it collapses parents before all the children to be collapsed have been collapsed. We should collapse from children inwards, only collapse a parent when all its children are either collapsed or big enough.
I can easily fix this, but itβs a bit late here, so maybe tomorrow)
Are you happy with it now @joverlee521? I've re-requested review
Thanks so much for PRs here @corneliusroemer π. This is a big improvement both in terms of code overhead (by using pango_aliasor
) and in terms of algorithm (to work upwards from tips). I just made some simple changes in the above commit to reduce verbosity of the standard out logging. Otherwise it looks great to me.
And thank you for adding tests @joverlee521 π.
Now just blocked by https://github.com/nextstrain/forecasts-ncov/pull/39
Waiting on the latest nextstrain/base image to be pushed to merge this.
Followup of #35
I've previously written the pango_aliasor package to simplify collapsing and aliasing of Pango lineage names
Using that package (available through pip and bioconda) removes a lot of code No more reliance on a Nextclade tree to get aliasing info Instead aliasing info comes from the root source, the pango-designation repo
The only downside is that we don't currently install pango_aliasor in docker-base But there's no real reason why we shouldn't add it there It's a small python package on the order of kB with no dependencies other than Python>3.7
Blocking:
pango_aliasor
to docker-base, see https://github.com/nextstrain/docker-base/pull/146