yatisht / usher

Ultrafast Sample Placement on Existing Trees
MIT License
120 stars 40 forks source link

getting non-coding mutations using `matUtils summary --translate` #336

Open jbloom opened 1 year ago

jbloom commented 1 year ago

Is there a way to get mutations at non-coding sites along branches the same way that summary --translate does it for coding sites? As far as I can tell, the current command only tracks mutations on branches at coding sites.

AngieHinrichs commented 1 year ago

matUtils extract --sample-paths makes a large file with the path of nucleotide mutations to each sample (leaf node).

matUtils extract --clade-paths makes a file with the nucleotide path to each annotated clade or lineage (for the SARS-CoV-2 UShER trees, both Nextstrain clades and Pango lineages are annotated).

AngieHinrichs commented 1 year ago

I see the usage message for matUtils summary --translate indicates that it should output both AA and nucleotide mutations:

  -t [ --translate ] arg              Write a tsv listing the amino acid and 
                                      nucleotide mutations at each node.

-- @jmcbroome is it implicitly "nucleotide mutations for coding sites only" or was it meant to cover both coding and noncoding sites? If it's coding-only (because it's about AA translation?) then it would be helpful for the usage message to state that explicitly.

jbloom commented 1 year ago

Thanks @AngieHinrichs, you are correct that --sample-paths lets me get comparable information with a little post-processing, so that is great!

I will keep this issue open for now in case you want to wait to resolve your question immediately above to @jmcbroome about clarifying either the results or usage message for --translate. But from the perspective of my original question you can consider this issue resolved and close it any time.