theosanderson / taxonium

A tool for exploring very large trees in the browser
http://taxonium.org
GNU General Public License v3.0
99 stars 17 forks source link

taxoniumtools: support compound features #499

Closed theosanderson closed 1 year ago

theosanderson commented 1 year ago

This was hopefully added in #500 (new release rolling out now). @AngieHinrichs requested this (though it was much needed regardless!). Angie, please let me know if you spot this doing anything odd. I've done my best to test, but it can be a bit tricky.

AngieHinrichs commented 1 year ago

Thanks so much @theosanderson, I'll try it out here!

theosanderson commented 1 year ago

(For others reading) while theoretically this allows the use of the conventional Hu-1 gb (https://www.ncbi.nlm.nih.gov/nuccore/1798174254) I'm not actually sure how that will work due to the fact that it has two distinct ORF1ab CDS features, whereas I think I've written this to assume only one CDS with each gene name.

theosanderson commented 1 year ago

(BTW Angie, if/when UShER needs support for passing in annotations for multiple chromosomes (gff or whatever), do raise an issue :) )

AngieHinrichs commented 1 year ago

Strangely I'm getting a KeyError when I run the latest pip-installed usher_to_taxonium on a Dengue subtype 1 tree (no fancy multi-part features, used to work fine): denv1.2023-07-28.pb.gz NC_001477.1.gbff.gz [gzipped for github]

usher_to_taxonium --input denv1.2023-07-28.pb.gz \
        --genbank NC_001477.1.gbff \
        --output denv1.2023-07-28.taxonium.jsonl.gz \
        >& usher_to_taxonium.1.log

tail usher_to_taxonium.$subtype.log
...
  File "/cluster/home/angie/miniconda3/lib/python3.9/site-packages/taxoniumtools/ushertools.py", line 171, in recursive_mutation_analysis
    recursive_mutation_analysis(child, new_past_nuc_muts_dict, seq, cdses,
  [Previous line repeated 37 more times]
  File "/cluster/home/angie/miniconda3/lib/python3.9/site-packages/taxoniumtools/ushertools.py", line 168, in recursive_mutation_analysis
    node.aa_muts = get_mutations(new_past_nuc_muts_dict,
  File "/cluster/home/angie/miniconda3/lib/python3.9/site-packages/taxoniumtools/ushertools.py", line 126, in get_mutations
    initial_codon[flipped_dict[position]] = value
KeyError: 2
theosanderson commented 1 year ago

Thank you for letting me know and apologies!

AngieHinrichs commented 1 year ago

But in happier news, it's working great for influenza segments with joins like NC_026432.1.gbff now! 😄

theosanderson commented 1 year ago

Believe this is all sorted