phac-nml / rebar

REcombination BARcode detector.
https://phac-nml.github.io/rebar/
Apache License 2.0
13 stars 1 forks source link

Error !! #3

Open wtchoga opened 1 year ago

wtchoga commented 1 year ago
Screenshot 2023-06-12 at 19 43 33
GLTOhorsman commented 10 months ago

I might be seeing a similar message to @wtchoga . @ktmeaton do you know if https://github.com/phac-nml/rebar/blob/ae94b5327b5366ebfa80dfb60aac614b7d0d05cb/rebar/constants.py#L25 (url to https://github.com/corneliusroemer/pango-sequences) has from time to time excluded a lineage unintentionally?

I'm trying to track down why `rebar/dataset/sars-cov-2-latest/alignment.fasta' builds itself and skips from XCC to XCE, but I worry each one of us is working with a different pango-sequences pull

ktmeaton commented 10 months ago

Hi @wtchoga and @GLTOhorsman! Thanks for documenting this, I've encountered the same thing. I'm working on a larger fix for this, and am aiming for a pre-release within the week. Part of this will include version controlling which pango-sequences pull was used (ex. commit hash/date), for reproducible analyses.

@GLTOhorsman, I had the exact same question yesterday about pango-sequences excluding lineages. In that repo's Caveats section, it's mentioned that lineages will only be included "if there are more than 3 sequences of a lineage available as open data (Genbank/RKI/COG-UK)". For example, at the time of writing, XCU and its specific parents (XBC.1.7.1 and FL.23.2.1) are not yet available, since they have too few sequences. I will include a warning if a population is expected to be present (ex. XCU is designated) but doesn't have a consensus sequence yet.