Closed shaghayeghsoudi closed 1 year ago
Hi there --
Assuming you've run Pairtree and obtained an .npz
file, this .npz
file has the information for all of the tree structures found as well as the cluster information. This Pairtree documentation section may be helpful: https://github.com/morrislab/pairtree#description-of-the-resultsnpz-file-format
Here's a quick example in Python for how you could use the results in the .npz
file to find all of the clusters between the root of the tree and the first branching event:
import json
import numpy as np
data = np.load("path/to/npz") # load the npz file
tree_0 = data["struct"][0] # extract the parents vector for the best tree found
clusters = json.loads(data["clusters.json"]) # extract the clustering information for later use
n = 0 # starting from the root node
while sum(tree_0 == n) == 1: # while the current node doesn't have multiple children
n = tree_0[tree_0 == n][0] + 1 # get the child node of the current node
print("Cluster %d occurred before the first branching event" % (n-1))
It should be easy enough to translate this to R and modify it for your purposes.
I am trying to map driver genes on the trees that pairtree generated. I am parsing json files into r but the problem that I have is that by just using the information and stats from the jsons is there any way I could determine which clusters are located on the trunk and which on branches?