Closed mdiaz09 closed 5 months ago
Hi, apologies for the (very) delayed response. The state trees are generated according to the model first introduced in El-Kebir et. al, Cell Systems 2016. You can find the full explanation in section B.5 in the supplemental information and figure S2:
The algorithm enumerates a set of spanning trees on a grid graph (as shown above), where the points on the grid are defined by the triple consisting of the number of maternal, paternal, and mutated copies, where each edge corresponds to an amplification, deletion, or mutation, such that all observed copy-number states are represented and there is exactly one mutation edge. The graph (and number of trees) is relatively small at low copy-numbers, but grows considerably as the maximum copy-number increases. Thus, we typically recommend excluding mutations in regions with total copy number greater than 6 (as in the majority of tumors, this is a small minority of mutations).
To answer your other question, generatestatetrees does not pull from any other files, just the cn_states
Hello,
I was wondering if you could give a quick overview of how the generatestatetrees function operates. The cn_states.txt file is created by the vcf_2_decifer.py but I wasn't sure if generatestatetrees pulled in data from other files as it takes a substantial amount of time.
Thank you