lima1 / PureCN

Copy number calling and variant classification using targeted short read sequencing
https://bioconductor.org/packages/devel/bioc/html/PureCN.html
Artistic License 2.0
124 stars 32 forks source link

Error in runAbsoluteCN: ploidy NA #74

Closed kdkorthauer closed 5 years ago

kdkorthauer commented 5 years ago

Hi Markus,

I'm getting the following error running runAbsoluteCN (only including seemingly relevant output):

INFO [2019-02-27 17:22:08] Testing local optimum 8/8 at purity 0.21 and total ploidy 2.00...
FATAL [2019-02-27 17:22:16] Could not calculate copy number log-likelihood for purity NA and total 

FATAL [2019-02-27 17:22:16] ploidy NA. 

FATAL [2019-02-27 17:22:16]  

FATAL [2019-02-27 17:22:16] This runtime error might be caused by invalid input data or parameters. 

FATAL [2019-02-27 17:22:16] Please report bug (PureCN 1.13.19). 

Error: Could not calculate copy number log-likelihood for purity NA and total 
ploidy NA. 

This runtime error might be caused by invalid input data or parameters. 
Please report bug (PureCN 1.13.19). 
In addition: Warning messages:
1: In max(px.rij.s) : no non-missing arguments to max; returning -Inf
2: In min(which(runif(n = 1, min = 0, max = sum(px.rij.s)) <= cumsum(px.rij.s))) :
  no non-missing arguments to min; returning Inf
Execution halted

Any ideas why this might be? Happy to give more details about the files/parameters I'm using that throw the error if needed.

Best, Keegan

lima1 commented 5 years ago

Sorry you are having trouble getting this to work. I have also not seen this error before. Do you get the error without providing a VCF? If so, can you provide the files and parameters?

If not, can you share the complete log-file of PureCN.R (or output of runAbsoluteCN)? Remember to not share any VCFs publicly.

kdkorthauer commented 5 years ago

Unfortunately, I do not get the error without providing a VCF. Here is the complete output of runAbsoluteCN:

>     ret <- runAbsoluteCN(normal.coverage.file = ncov, 
+                          tumor.coverage.file = tcov, 
+                          vcf.file = vfile, 
+                          genome = "hg19", 
+                          sampleid = gsub(".vcf", "", vfile), 
+                          interval.file = interval.file, 
+                          normalDB = normalDB,
+                          args.filterVcf = list(snp.blacklist = snp.blacklist.file, 
+                                               stats.file = mutect.stats.file),
+                          post.optimize = FALSE,
+                          test.purity = seq(0.9,0.99,by=0.01) 
INFO [2019-02-28 13:46:59] ------------------------------------------------------------
INFO [2019-02-28 13:46:59] PureCN 1.13.19
INFO [2019-02-28 13:46:59] ------------------------------------------------------------
INFO [2019-02-28 13:46:59] Arguments: -normal.coverage.file ../../../PREPROCESS/DNA/purecn/DFCI-5368-CL-01_coverage_poolnorm.txt -tumor.coverage.file ../../../PREPROCESS/DNA/purecn/DFCI-5368-CL-01_coverage_loessnorm.txt -vcf.file ../../../PREPROCESS/DNA/mutect-results/DFCI-5368-CL-01.vcf -genome hg19 -args.filterVcf ../../../PREPROCESS/DNA/annotation/hg19_simpleRepeats.bed,../../../PREPROCESS/DNA/mutect-results/DFCI-5368-CL-01_call_stats.txt -sampleid DFCI-5368-CL-01 -test.purity 0.9,0.91,0.92,0.93,0.94,0.95,0.96,0.97,0.98,0.99 -interval.file ../../../PREPROCESS/DNA/purecn/intervals.txt -post.optimize FALSE -normalDB <data>
INFO [2019-02-28 13:46:59] Loading coverage files...
INFO [2019-02-28 13:47:06] Mean target coverages: 252X (tumor) 249X (normal).
INFO [2019-02-28 13:47:07] Mean coverages: chrX: 302.72, chrY: 2.36, chr1-22: 239.86.
INFO [2019-02-28 13:47:07] Mean coverages: chrX: 149.72, chrY: 13.74, chr1-22: 245.51.
WARN [2019-02-28 13:47:07] Sex tumor/normal mismatch: tumor = 
WARN [2019-02-28 13:47:12] tumor.coverage.file and interval.file do not align.
INFO [2019-02-28 13:47:14] Removing 761 intervals with missing log.ratio.
INFO [2019-02-28 13:47:14] Removing 4485 intervals excluded in normalDB.
INFO [2019-02-28 13:47:14] normalDB provided. Setting minimum coverage for segmentation to 0.0015X.
INFO [2019-02-28 13:47:14] Removing 20 low coverage (< 0.0015X) intervals.
INFO [2019-02-28 13:47:14] Using 283164 intervals (264321 on-target, 18843 off-target).
INFO [2019-02-28 13:47:14] Ratio of mean on-target vs. off-target read counts: 0.11
INFO [2019-02-28 13:47:14] Mean off-target bin size: 113756
INFO [2019-02-28 13:47:15] AT/GC dropout: 1.05 (tumor), 1.02 (normal). 
INFO [2019-02-28 13:47:15] Loading VCF...
INFO [2019-02-28 13:47:39] Found 2133661 variants in VCF file.
INFO [2019-02-28 13:47:46] 1875634 (87.9%) variants annotated as likely germline (DB INFO flag).
INFO [2019-02-28 13:47:59] DFCI-5368-CL-01 is tumor in VCF file.
INFO [2019-02-28 13:48:06] 1032 homozygous and 5769 heterozygous variants on chrX.
INFO [2019-02-28 13:48:06] Sex from VCF: F (Fisher's p-value: 0.002, odds-ratio: 0.90).
WARN [2019-02-28 13:49:01] MuTect stats file and VCF file do not align perfectly. Will remove 0 unmatched variants.
INFO [2019-02-28 13:49:16] Removing 673498 MuTect calls due to blacklisted failure reasons.
INFO [2019-02-28 13:49:22] Removing 1378984 non heterozygous (in matched normal) germline SNPs.
INFO [2019-02-28 13:49:26] Initial testing for significant sample cross-contamination: unlikely
INFO [2019-02-28 13:49:27] Removing 14274 variants with AF < 0.030 or AF >= 1.000 or less than 4 supporting reads or depth < 15.
INFO [2019-02-28 13:49:29] Removing 0 blacklisted variants.
INFO [2019-02-28 13:49:30] Removing 1294 low quality variants with BQ < 25.
INFO [2019-02-28 13:49:30] Total size of targeted genomic region: 31.09Mb (54.51Mb with 50bp padding).
INFO [2019-02-28 13:49:30] 7.9% of targets contain variants.
INFO [2019-02-28 13:49:30] Removing 42988 variants outside intervals.
INFO [2019-02-28 13:49:30] Found SOMATIC annotation in VCF.
INFO [2019-02-28 13:49:30] Setting somatic prior probabilities for somatic variants to 0.999000 or to 0.000100 otherwise.
INFO [2019-02-28 13:49:30] Found SOMATIC annotation in VCF. Setting mapping bias to 0.975.
INFO [2019-02-28 13:49:30] Excluding 90 novel or poor quality variants from segmentation.
INFO [2019-02-28 13:49:30] Sample sex: F
INFO [2019-02-28 13:49:30] Segmenting data...
INFO [2019-02-28 13:49:31] Loading pre-computed boundaries for DNAcopy...
INFO [2019-02-28 13:49:31] Setting undo.SD parameter to 0.750000.
INFO [2019-02-28 13:50:12] Setting undo.SD parameter to 1.125000.
Setting multi-figure configuration
INFO [2019-02-28 13:51:27] Setting prune.hclust.h parameter to 0.150000.
INFO [2019-02-28 13:51:28] Found 968 segments with median size of 0.21Mb.
INFO [2019-02-28 13:51:28] Removing 50 variants outside segments.
INFO [2019-02-28 13:51:28] Using 22573 variants.
INFO [2019-02-28 13:51:30] Mean standard deviation of log-ratios: 0.16
INFO [2019-02-28 13:51:30] 2D-grid search of purity and ploidy...
INFO [2019-02-28 13:52:16] Local optima: 0.9/3.8, 0.97/2, 0.93/5.8, 0.93/4.8, 0.97/3, 0.97/1,
0.9/1.2, 0.21/2
INFO [2019-02-28 13:52:16] Testing local optimum 1/8 at purity 0.90 and total ploidy 3.80...
INFO [2019-02-28 13:53:15] Testing local optimum 2/8 at purity 0.97 and total ploidy 2.00...
INFO [2019-02-28 13:53:59] Testing local optimum 3/8 at purity 0.93 and total ploidy 5.80...
INFO [2019-02-28 13:54:09] Recalibrating log-ratios...
INFO [2019-02-28 13:54:09] Testing local optimum 3/8 at purity 0.93 and total ploidy 5.80...
INFO [2019-02-28 13:54:19] Recalibrating log-ratios...
INFO [2019-02-28 13:54:19] Testing local optimum 3/8 at purity 0.93 and total ploidy 5.80...
INFO [2019-02-28 13:55:27] Recalibrating log-ratios...
INFO [2019-02-28 13:55:27] Testing local optimum 3/8 at purity 0.93 and total ploidy 5.80...
INFO [2019-02-28 13:56:17] Testing local optimum 4/8 at purity 0.93 and total ploidy 4.80...
INFO [2019-02-28 13:57:16] Testing local optimum 5/8 at purity 0.97 and total ploidy 3.00...
INFO [2019-02-28 13:58:28] Testing local optimum 6/8 at purity 0.97 and total ploidy 1.00...
INFO [2019-02-28 13:59:22] Testing local optimum 7/8 at purity 0.90 and total ploidy 1.20...
INFO [2019-02-28 14:00:44] Testing local optimum 8/8 at purity 0.21 and total ploidy 2.00...
FATAL [2019-02-28 14:00:52] Could not calculate copy number log-likelihood for purity NA and total 

FATAL [2019-02-28 14:00:52] ploidy NA. 

FATAL [2019-02-28 14:00:52]  

FATAL [2019-02-28 14:00:52] This runtime error might be caused by invalid input data or parameters. 

FATAL [2019-02-28 14:00:52] Please report bug (PureCN 1.13.19). 

Error: Could not calculate copy number log-likelihood for purity NA and total 
ploidy NA. 

This runtime error might be caused by invalid input data or parameters. 
Please report bug (PureCN 1.13.19). 
In addition: Warning messages:
1: In max(px.rij.s) : no non-missing arguments to max; returning -Inf
2: In min(which(runif(n = 1, min = 0, max = sum(px.rij.s)) <= cumsum(px.rij.s))) :
  no non-missing arguments to min; returning Inf

More info on the inputs:

Note that I also set the test purity to the recommended setting for cell lines (0.90-0.99). If I change only that parameter to the default (0.15-0.90) then it runs without error.

Let me know if there's any more info that would be helpful to diagnose.

lima1 commented 5 years ago

Thanks, I think this should be fixed now. Never had cell lines with matched normals, so this bug went undiscovered for years.

You get a warning "tumor.coverage.file and interval.file do not align", otherwise it looks good.

If you need help with the ploidy selection, feel free to send screenshots by email. The default parameters aren't optimized for cell lines and increased heterogeneity in high purity samples can often return a wrong high ploidy solution.

A tip for manual curation: look at all chromosome arms with balanced SNPs (around 0.5). A real high ploidy solution has often multiple balanced segments with different copy numbers (total copy number 2/ minor copy number 1, 4/2, 6/3), wrong solutions only one (4/2).

kdkorthauer commented 5 years ago

Thank you for addressing that so quickly. It seems to be working fine now with the higher test.purity setting.

Yes, I guess it's not a common situation - here we have individual cell lines derived from tumor tissue, along with their matched normal tissue.

I'm not sure I completely understand your suggestion regarding looking at balanced SNPs and minor copy numbers. But I have seen some high ploidy solutions for some of my cell line and tumor samples. Once I finish the current run with the higher test.purity settings for my cell lines, I'll send you an email with followup questions.

I appreciate all the help!

Best, Keegan