theosanderson / taxonium

A tool for exploring very large trees in the browser
http://taxonium.org
GNU General Public License v3.0
95 stars 17 forks source link

Taxonium for bacteria #445

Open theosanderson opened 1 year ago

theosanderson commented 1 year ago

I know that some people have/are using Taxonium for bacterial genomes. I suspect that this probably poses some issues with e.g. the number of mutations on each branch which might get overwhelming. If any of you folks would like to chat about ways to make your experience better, do let me know!

AngieHinrichs commented 3 months ago

@lilymaryam, @aofarrel, @russcd and I are using taxonium to view UShER trees of M. tuberculosis genomes and it works fine unless we try to use the usher_to_taxonium --genbank option to see what the protein-coding mutations are. Then the first line of .jsonl.gz becomes so huge (650MB-900MB+ depending on the size of the tree & filtering options) that it apparently exceeds v8's string length limit of 512MB and node crashes with the error RangeError: Invalid string length (https://github.com/nodejs/node/issues/35973). A more compact JSON representation of mutations might help, and/or splitting some of the first line values into multiple lines? I can provide example files if that would help.

theosanderson commented 3 months ago

Thanks a lot for the report, and it's exciting that you are doing this!

Have you tried adding the --only_variable_sites parameter? I think the issue could be about the encoding of the ref genome. I definitely need a better solution to that generally, and intend that, but it could be a kind of workaround for now.

AngieHinrichs commented 3 months ago

Have you tried adding the --only_variable_sites parameter?

Ah, I didn't know about that one! And it does fix it! Thanks and I'll use that for large genomes going forward (and make sure to look at the --help again next time I have a problem 🙂).

theosanderson commented 3 months ago

Fantastic, and no prob, and it still definitely needs a real solution!