morrislab / pairtree

Pairtree is a method for reconstructing cancer evolutionary history in individual patients, and analyzing intratumor genetic heterogeneity. Pairtree focuses on scaling to many more cancer samples and cancer cell subpopulations than other algorithms, and on producing concise and informative interactive characterizations of posterior uncertainty.
MIT License
33 stars 10 forks source link

Trees output full results #6

Closed underbais closed 3 years ago

underbais commented 3 years ago

Hello

In PhyloWGS we get mut assignments and trees structures (e.d.summ.json) to play with:

My question is do we get just one perfect tree in Pairtree as a final result or there is away to look at all of them? If so, how? And how do I get mut assignments and tree structure out of it?

Thanks Chingiz

jwintersinger commented 3 years ago

Hi Chingiz,

I should document this in the README -- thanks for reaching out! Pairtree produces multiple posterior samples, just like PhyloWGS.

Please let me know if you have any other questions!

underbais commented 3 years ago

Hi Jeff,

Thanks a lot, very helpful. I guess my issue is with results.npz format. In PhyloWGS mutass.zip has all mut assignments and summ.json has all tree structures. Just wonder how to extract those from npz. More info on the npz output structure would really help.

Best, Chingiz

jwintersinger commented 3 years ago

Hi Chingiz,

No problem! Unlike PhyloWGS, Pairtree uses the same SSM clustering for every tree it samples. This means that the assignment of SSMs to subclones is just determined by the variant clustering you provide in the params.json input file. I.e., the mutations for tree node 1 are those in the first cluster, the mutations for node 2 are those in the second cluster, and so on.

To get results that are easier to parse, run bin/plottree with the --tree-json <name>.json option. That will give you output like this: https://gist.github.com/jwintersinger/5670966f731204ffc558bc04df7ff403.

The NPZ file will have that information for every tree, but if you're just trying to parse the information for one tree, the JSON should be easier to work with. I'll be happy to give more details if you have other questions!

oghzzang commented 6 months ago

Hi, I have a question related to this thread. Then, which mutations does the "node0" have?

ethanumn commented 6 months ago

Hi Hayley -

Node 0 represents the non-cancerous genome. It does not contain any of the mutations that you might have included in your analysis. We've written extensively about how to interpret the trees output by Pairtree here and here.

Ethan

oghzzang commented 5 months ago

Many thanks!!!