viq854 / lichee

Multi-sample cancer phylogeny reconstruction
http://viq854.github.io/lichee/
Other
34 stars 15 forks source link

No valid tree found #5

Closed detorrentel closed 7 years ago

detorrentel commented 7 years ago

Hi,

I have WGS with a lot of SNVs called for example, for one patient I have 47000 SNVs called for 4 samples and another patient had 240,000 SNVs called for 12 patients. As the SNVs were already called in my data I am using the form (where S_i=0 if the SNV is not called in this sample and S_i=VAF if it is called)

#chr    position    description    profile        Normal    S1    S2    S3    S4      
1       184306474                   00111          0         0   0.2   0.25  0.15    

Unfortunately after running it with: ./lichee -build -i Lichee_inputs_Patient_J.txt -sampleProfile -n 0 -showTree 1 I got a "no valid tree was found". I am trying to change several of the option to see if I can get something but I was wondering if you had any advice? My first thoughts would be to increase the error margin -e to 0.2, 0.3 etc., what would be the maximum value you would advise me to take for the error? The second thought would be to change -maxClusterDist to 0.3, 0.35 etc.?

Other things I could change?

Thank you for your help!

viq854 commented 7 years ago

I wouldn't suggest increasing the error margin by much (definitely not much beyond 0.2). So let's try a few other things first. How many nodes do you end up with in the network? Are there many nodes with only a few mutations? Since these could be outliers, I would suggest increasing -minClusterSize and/or -minPrivateClusterSize. If there are many clusters corresponding to the same binary profile, increasing -maxClusterDist to collapse them can help, especially if their centroid stddev is high. If this is low coverage WGS data, it might be also helpful to first run a tool like PyClone to get CP values instead of VAFs (which will account for CVNs, LOH, purity) and then feed those values to find the tree (using the -cp flag).

detorrentel commented 7 years ago

Thank you for your answer. As Lichee doesn't find any valid tree, I don't get any resulting txt at the end. Is there a way to get the trees even if they are not valid to see how many nodes they have, the stddev etc.? I will look into increasing the minCluster/minPrivateCluster Size and see what I get. The maxClusterDist helps but not in every cases. I am also trying to decrease my number of SSNVs, taking only the one with a coverage >30, 50, 100 etc. and see what I get and if it is consistent. I have never used PyClone but I am afraid that with that many SNVs it will take forever to get something out of it.

viq854 commented 7 years ago

Oh you should enable the verbose mode with (-v), this will print the information about the network (the nodes, centroids, stddev, etc). You could also visualize the network with -net, but the graph can be quite complex (btw, the nodes are draggable), so best to debug with the printed network info. And I would definitely recommend filtering based on coverage (perhaps you could start with a more robust set with coverage >50).

detorrentel commented 7 years ago

Ok thank you, I will try that!

detorrentel commented 7 years ago

I ran it with maxClusterDist=0.4, maxVAFValid=0.72 (max purity of the 4 samples) and error=0.2, I don't get any valid tree still but I put the verbose option like you suggested, This is what I get just before it's telling me "no valid tree found":

--- PHYLOGENETIC CONSTRAINT GRAPH --- 
numNodes = 16, numEdges = 29
NODES: 
level = 6: 
Node 0: 
level = 5: 
level = 4: 
Node 13: group tag = 01111, Size: 76
VAF Mean: [ 0.43  0.38  0.42  0.43 ] 
       Stdev: [ 0.14  0.12  0.1  0.11 ]
level = 3: 
Node 1: group tag = 01110, Size: 15
VAF Mean: [ 0.37  0.42  0.46 ] 
       Stdev: [ 0.12  0.13  0.13 ]
Node 4: group tag = 01011, Size: 14
VAF Mean: [ 0.39  0.41  0.4 ] 
       Stdev: [ 0.15  0.11  0.13 ]
Node 9: group tag = 00111, Size: 27
VAF Mean: [ 0.37  0.41  0.45 ] 
       Stdev: [ 0.1  0.1  0.1 ]
Node 15: group tag = 01101, Size: 56
VAF Mean: [ 0.43  0.36  0.43 ] 
       Stdev: [ 0.11  0.11  0.11 ]
level = 2: 
Node 3: group tag = 01010, Size: 9
VAF Mean: [ 0.46  0.45 ] 
       Stdev: [ 0.09  0.17 ]
Node 6: group tag = 00101, Size: 43
VAF Mean: [ 0.34  0.38 ] 
       Stdev: [ 0.12  0.11 ]
Node 8: group tag = 00110, Size: 14
VAF Mean: [ 0.4  0.44 ] 
       Stdev: [ 0.09  0.1 ]
Node 11: group tag = 00011, Size: 20
VAF Mean: [ 0.44  0.46 ] 
       Stdev: [ 0.13  0.1 ]
Node 12: group tag = 01001, Size: 24
VAF Mean: [ 0.42  0.42 ] 
       Stdev: [ 0.09  0.11 ]
Node 14: group tag = 01100, Size: 80
VAF Mean: [ 0.39  0.31 ] 
       Stdev: [ 0.12  0.1 ]
level = 1: 
Node 2: group tag = 01000, Size: 30061
VAF Mean: [ 0.31 ] 
       Stdev: [ 0.15 ]
Node 5: group tag = 00010, Size: 2011
VAF Mean: [ 0.25 ] 
       Stdev: [ 0.12 ]
Node 7: group tag = 00100, Size: 12774
VAF Mean: [ 0.26 ] 
       Stdev: [ 0.13 ]
Node 10: group tag = 00001, Size: 1168
VAF Mean: [ 0.22 ] 
       Stdev: [ 0.11 ]
level = 0: 
EDGES: 
0 -> 13
1 -> 3
1 -> 8
1 -> 14
3 -> 2
3 -> 5
4 -> 3
4 -> 11
4 -> 12
6 -> 7
6 -> 10
8 -> 5
8 -> 7
9 -> 6
9 -> 8
9 -> 11
11 -> 5
11 -> 10
12 -> 2
12 -> 10
13 -> 1
13 -> 4
13 -> 9
13 -> 15
14 -> 2
14 -> 7
15 -> 6
15 -> 12
15 -> 14

Found 0 valid tree(s)
Adjusting the network...
Found 0 valid trees after network adjustments

Any suggestions? For now, it is with all the SSNVs, my next step is to take only coverage>50,100 etc. I don't think I have specially small cluster size, even if node3 has only a size of 9, I don't feel like it should be considered as small, should it?

I saw in the paper that it could also be CP found by ABSOLUTE, which is what I use to get the purity of my sample so I do have the outputs of that. So just to be sure I am using the right value for CP, in the ABSOLUTE output *_ABS_MAF, I should take the value of cancer_cell_frac as CP right?

viq854 commented 7 years ago

Just to confirm: are you currently passing the VAFs or CP values to the program? If CPs, you should enable the -cp flag (the -maxVAFValid option will no longer apply in this case and the root CP will be set to 1).

In the output I see that the VAFs of the first couple of nodes (assuming it's VAFs and not CPs) will already violate the sum-of-children phylogenetic constraint: the nodes with the profiles 01110, 01011, 00111, and 01101 must be siblings but all have very high centroids (which the possible parents 01111 and root cannot accommodate). If these are VAFs, it might be best to switch to the CPs in order to correct for CNVs (which could account for this result). Could you send me the full log and the command used to run the program (email might be easiest)? Also the current -maxClusterDist is very high and will probably cause the collapse of all clusters, I recommend to use the default for now, especially while debugging.

detorrentel commented 7 years ago

Thank you, yes no problem I'll send an email (to viq@stanford.edu) to continue the conversation.

Rashesh7 commented 6 years ago

Hi,

I am facing the same issue. Can you please elaborate on the solution you found, if any, for this?

Your support is much appreciated.

Thank you.

Ignatiocalvin commented 3 years ago

Hi,

I'd also like to know whether there were any solutions regarding this?

Thanks.