Closed Biophylo2001 closed 4 months ago
The first line of the jsonl contains various bits of data for the whole tree, including a list of mutations. Each line below has a list of mutations with numbers where the number is the index of the mutation in the initial list. So you could analyse it with that. You can also analyse the Usher protobuf with e.g. BTE (big tree explorer)
@theosanderson Thank you for your reply, I have tried using the jsonl file to extract the mutation data however, in the Treenome browser it doesn't show the accurate number of result. Maybe I am doing something wrong.
For example, For the S 1082 mutation in the screenshot, it shows only "1 result" circled at the root node. What does that mean? I am sure there are plenty of S 1082 residue mutations present but its not shown accurately.
Sorry that I missed this. Without seeing your tree file it's hard to assess what's going on here.
@theosanderson Here i have attached my tree file . Thank you for looking into it. treefile.jsonl.gz
S:1082 appears to be C throughout your tree
Hi, I've generated my SARS-CoV-2 tree (jsonl format) using the usher command lines. Now, I want to extract mutation frequency and identify predominant amino acid mutations in my tree. How can I efficiently do this so that i can use that information with Taxonium's> Search > Mutation section for studying spread since manually inputting all mutations is time-consuming.
Do I do that with from my combined VCF files or can it be done using the JSONL file itself? How was this process implemented in creating the Cov2tree? Thank you. Here's how my merged VCF file looks:
snpeffdata.vcf.gz
Thanks a lot