morrislab / pairtree

Pairtree is a method for reconstructing cancer evolutionary history in individual patients, and analyzing intratumor genetic heterogeneity. Pairtree focuses on scaling to many more cancer samples and cancer cell subpopulations than other algorithms, and on producing concise and informative interactive characterizations of posterior uncertainty.
MIT License
37 stars 11 forks source link

Finding trunk and branch clusters from json files #35

Closed shaghayeghsoudi closed 1 year ago

shaghayeghsoudi commented 1 year ago

I am trying to map driver genes on the trees that pairtree generated. I am parsing json files into r but the problem that I have is that by just using the information and stats from the jsons is there any way I could determine which clusters are located on the trunk and which on branches?

ethanumn commented 1 year ago

Hi there --

Assuming you've run Pairtree and obtained an .npz file, this .npz file has the information for all of the tree structures found as well as the cluster information. This Pairtree documentation section may be helpful: https://github.com/morrislab/pairtree#description-of-the-resultsnpz-file-format

Here's a quick example in Python for how you could use the results in the .npz file to find all of the clusters between the root of the tree and the first branching event:

import json
import numpy as np

data = np.load("path/to/npz") # load the npz file
tree_0 = data["struct"][0] # extract the parents vector for the best tree found
clusters = json.loads(data["clusters.json"]) # extract the clustering information for later use

n = 0 # starting from the root node 
while sum(tree_0 == n) == 1: # while the current node doesn't have multiple children
    n = tree_0[tree_0 == n][0] + 1 # get the child node of the current node
    print("Cluster %d occurred before the first branching event" % (n-1))

It should be easy enough to translate this to R and modify it for your purposes.