Closed theosanderson closed 1 year ago
Thanks so much @theosanderson, I'll try it out here!
(For others reading) while theoretically this allows the use of the conventional Hu-1 gb (https://www.ncbi.nlm.nih.gov/nuccore/1798174254) I'm not actually sure how that will work due to the fact that it has two distinct ORF1ab CDS features, whereas I think I've written this to assume only one CDS with each gene name.
(BTW Angie, if/when UShER needs support for passing in annotations for multiple chromosomes (gff or whatever), do raise an issue :) )
Strangely I'm getting a KeyError when I run the latest pip-installed usher_to_taxonium on a Dengue subtype 1 tree (no fancy multi-part features, used to work fine): denv1.2023-07-28.pb.gz NC_001477.1.gbff.gz [gzipped for github]
usher_to_taxonium --input denv1.2023-07-28.pb.gz \
--genbank NC_001477.1.gbff \
--output denv1.2023-07-28.taxonium.jsonl.gz \
>& usher_to_taxonium.1.log
tail usher_to_taxonium.$subtype.log
...
File "/cluster/home/angie/miniconda3/lib/python3.9/site-packages/taxoniumtools/ushertools.py", line 171, in recursive_mutation_analysis
recursive_mutation_analysis(child, new_past_nuc_muts_dict, seq, cdses,
[Previous line repeated 37 more times]
File "/cluster/home/angie/miniconda3/lib/python3.9/site-packages/taxoniumtools/ushertools.py", line 168, in recursive_mutation_analysis
node.aa_muts = get_mutations(new_past_nuc_muts_dict,
File "/cluster/home/angie/miniconda3/lib/python3.9/site-packages/taxoniumtools/ushertools.py", line 126, in get_mutations
initial_codon[flipped_dict[position]] = value
KeyError: 2
Thank you for letting me know and apologies!
But in happier news, it's working great for influenza segments with joins like NC_026432.1.gbff now! 😄
Believe this is all sorted
This was hopefully added in #500 (new release rolling out now). @AngieHinrichs requested this (though it was much needed regardless!). Angie, please let me know if you spot this doing anything odd. I've done my best to test, but it can be a bit tricky.