Closed jsha129 closed 1 year ago
Hi there --
So I'm assuming you're using the example data provided in the pairtree/example folder. If this is the case and you are setting the parameter --seed=2022
, then I am not getting quite the same output you are for the eta
values. Either way, I think it's a pretty simple fix to prepare the pairtree outputs to be inputted into fishplot.
The first thing to note is your pop_freq
matrix has 4 populations, while the parents
variable only has 3 populations. Pairtree outputs this by design, as the "Node 0" in all of the trees outputted represents the population of cells from which all cancerous cell populations are descended from. A more detailed explanation of this can be found in the Pairtree STAR protocol paper, particularly the following sections: 13. Interpreting the results from summposterior, and 16. Interpreting the results from plottree.
However, the main issue is the pop_freq
matrix. I believe what fishplot is actually expecting are the phi
values and the confusion here is from the different use of terminology. If I'm wrong please correct me.
I believe the following python code should fix your problem:
phi
) and parents (struct
) from the pairtree output file (e.g., example.results.npz
)import os
import numpy as np
results_npz = np.load(os.path.join("path/to/pairtree/example/results, "example.results.npz"))
parents = results_npz["struct"][0] # assume you want the parents vector associated with the tree at index 0
phi = results_npz["phi"][0] # extract the tree constrained subclonal frequencies associated with the tree at index 0
parents = np.insert(parents + 1, 0, 0) # format for fishplot
This data can then be exported to R using a variety of different packages. From reading the fishplot documentation and looking at their paper, I believe these are the inputs the program is looking for. I was able to successfully generate plots using this approach.
Please let me know if this is helpful.
Ethan
Hi Ethan, Thank you for answering the question. Your solution works! I wasn't sure how to read npz file in R, so I ended up using data for the best tree (tree0) exported as json. To do this from bash/commandline:
$PTDIR/bin/plottree \
--runid ${patient} \
--tree-index 0\
--tree-json tree0.json \
--phi-orientation populations_as_rows \
$input.ssm \
$input_params.json \
pairtree.results.npz \
pairtree.results_tree0.html
Here's its implementation in R for the fishplot using the json data:
library(fishplot)
## get pop freq and parents data from pairtree export
## It was easier to read a json than npz file in R. Hence, data for tree0 (the best tree) was exported as a json.
library("rjson")
json_file <- "tree0.json"
json_data <- fromJSON(paste(readLines(json_file), collapse=""))
##! get population frequencies
frac.table <- Reduce(rbind, json_data$phi[1:length(json_data$phi)]) # not eta
frac.table <- 100 * round(frac.table, 2)
colnames(frac.table ) <- json_data$samples
rownames(frac.table ) <- seq(0, nrow(frac.table )-1)
print(frac.table)
## get the structure of the clones, aka parents
## need to insert 0 in the beginning since 0 is the founder clone for everything. 'parents + 1' because python is 0-index and R/fishplot is 1.
parents <- c(0, json_data$parents + 1)
print(parents)
#visualization. distribute samples at regular interval from 0 to 100.
timepoints <- seq(0, 100, 100/(ncol(frac.table) - 1) )
#create a fish object
fish = createFishObject(frac.table,parents,timepoints=timepoints)
#calculate the layout of the drawing
fish = layoutClones(fish)
#draw the plot, using the splining method (recommended)
#and providing both timepoints to label and a plot title
fishPlot(fish,shape="spline",title.btm="Patient X",
cex.title=0.5, vlines=timepoints,
vlab=colnames(frac.table))
Awesome, glad it worked out! Thanks for providing the code you used.
Hi there, thank you for developing this wonderful tool. It's FAST! We want to use pairtree output for fishplot package ('https://github.com/chrisamiller/fishplot'). I ran the example data using seed 2022 and extracted data for the tree 0 to get 'eta' and 'parents'. The output seems to be incompatible with fishplot. Could you please provide suggestions to overcome this? Many thanks