Open Biophylo2001 opened 4 months ago
Hi @Sanyukta2001, I don't think we have a single command to do that, but I think it could be done by extracting VCF from your protobuf tree file, filtering the VCF to keep only the Spike mutations, and running usher-sampled on the filtered VCF to build a new tree. It would go something like this, assuming the reference sequence is NC_045512.2 (Wuhan/Hu-1) for your tree so Spike coords are 21563-25384:
matUtils extract -i my_pb_file.pb -v my_pb_mutations.vcf
# Filter VCF to only mutations within the Spike gene coordinates
grep ^# my_pb_mutations.vcf > my_pb_mutations.filtered.vcf
grep -v ^# my_pb_mutations.vcf | awk '$2 >= 21563 && $2 <= 25384' >> my_pb_mutations.filtered.vcf
# Use filtered VCF to build a new tree
echo '()' > emptyTree.nwk
usher-sampled -t emptyTree.nwk -v my_pb_mutations.filtered.vcf -o my_pb_file.spikeOnly.pb
If your tree is large (say >10,000 samples) then it might be faster to add the usher-sampled option --optimization_radius 0
to prevent usher-sampled from doing rounds of optimization interspersed with adding samples from VCF, and then run matOptimize after usher-sampled.
Hi, Is there any way to extract all mutations in a Particular gene for all samples , say in the Spike region and then build a phylogenetic tree with it Something like this ?
matUtils extract --i my_pb_file.pb -m S_gene_mutation -t tree_with_S_gene_mutations.nwk
Or use the Mutation Annotated jsonl tree file to filter mutation in specific gene?
Thank you ,