Closed kdkorthauer closed 5 years ago
Sorry you are having trouble getting this to work. I have also not seen this error before. Do you get the error without providing a VCF? If so, can you provide the files and parameters?
If not, can you share the complete log-file of PureCN.R (or output of runAbsoluteCN)? Remember to not share any VCFs publicly.
Unfortunately, I do not get the error without providing a VCF. Here is the complete output of runAbsoluteCN:
> ret <- runAbsoluteCN(normal.coverage.file = ncov,
+ tumor.coverage.file = tcov,
+ vcf.file = vfile,
+ genome = "hg19",
+ sampleid = gsub(".vcf", "", vfile),
+ interval.file = interval.file,
+ normalDB = normalDB,
+ args.filterVcf = list(snp.blacklist = snp.blacklist.file,
+ stats.file = mutect.stats.file),
+ post.optimize = FALSE,
+ test.purity = seq(0.9,0.99,by=0.01)
INFO [2019-02-28 13:46:59] ------------------------------------------------------------
INFO [2019-02-28 13:46:59] PureCN 1.13.19
INFO [2019-02-28 13:46:59] ------------------------------------------------------------
INFO [2019-02-28 13:46:59] Arguments: -normal.coverage.file ../../../PREPROCESS/DNA/purecn/DFCI-5368-CL-01_coverage_poolnorm.txt -tumor.coverage.file ../../../PREPROCESS/DNA/purecn/DFCI-5368-CL-01_coverage_loessnorm.txt -vcf.file ../../../PREPROCESS/DNA/mutect-results/DFCI-5368-CL-01.vcf -genome hg19 -args.filterVcf ../../../PREPROCESS/DNA/annotation/hg19_simpleRepeats.bed,../../../PREPROCESS/DNA/mutect-results/DFCI-5368-CL-01_call_stats.txt -sampleid DFCI-5368-CL-01 -test.purity 0.9,0.91,0.92,0.93,0.94,0.95,0.96,0.97,0.98,0.99 -interval.file ../../../PREPROCESS/DNA/purecn/intervals.txt -post.optimize FALSE -normalDB <data>
INFO [2019-02-28 13:46:59] Loading coverage files...
INFO [2019-02-28 13:47:06] Mean target coverages: 252X (tumor) 249X (normal).
INFO [2019-02-28 13:47:07] Mean coverages: chrX: 302.72, chrY: 2.36, chr1-22: 239.86.
INFO [2019-02-28 13:47:07] Mean coverages: chrX: 149.72, chrY: 13.74, chr1-22: 245.51.
WARN [2019-02-28 13:47:07] Sex tumor/normal mismatch: tumor =
WARN [2019-02-28 13:47:12] tumor.coverage.file and interval.file do not align.
INFO [2019-02-28 13:47:14] Removing 761 intervals with missing log.ratio.
INFO [2019-02-28 13:47:14] Removing 4485 intervals excluded in normalDB.
INFO [2019-02-28 13:47:14] normalDB provided. Setting minimum coverage for segmentation to 0.0015X.
INFO [2019-02-28 13:47:14] Removing 20 low coverage (< 0.0015X) intervals.
INFO [2019-02-28 13:47:14] Using 283164 intervals (264321 on-target, 18843 off-target).
INFO [2019-02-28 13:47:14] Ratio of mean on-target vs. off-target read counts: 0.11
INFO [2019-02-28 13:47:14] Mean off-target bin size: 113756
INFO [2019-02-28 13:47:15] AT/GC dropout: 1.05 (tumor), 1.02 (normal).
INFO [2019-02-28 13:47:15] Loading VCF...
INFO [2019-02-28 13:47:39] Found 2133661 variants in VCF file.
INFO [2019-02-28 13:47:46] 1875634 (87.9%) variants annotated as likely germline (DB INFO flag).
INFO [2019-02-28 13:47:59] DFCI-5368-CL-01 is tumor in VCF file.
INFO [2019-02-28 13:48:06] 1032 homozygous and 5769 heterozygous variants on chrX.
INFO [2019-02-28 13:48:06] Sex from VCF: F (Fisher's p-value: 0.002, odds-ratio: 0.90).
WARN [2019-02-28 13:49:01] MuTect stats file and VCF file do not align perfectly. Will remove 0 unmatched variants.
INFO [2019-02-28 13:49:16] Removing 673498 MuTect calls due to blacklisted failure reasons.
INFO [2019-02-28 13:49:22] Removing 1378984 non heterozygous (in matched normal) germline SNPs.
INFO [2019-02-28 13:49:26] Initial testing for significant sample cross-contamination: unlikely
INFO [2019-02-28 13:49:27] Removing 14274 variants with AF < 0.030 or AF >= 1.000 or less than 4 supporting reads or depth < 15.
INFO [2019-02-28 13:49:29] Removing 0 blacklisted variants.
INFO [2019-02-28 13:49:30] Removing 1294 low quality variants with BQ < 25.
INFO [2019-02-28 13:49:30] Total size of targeted genomic region: 31.09Mb (54.51Mb with 50bp padding).
INFO [2019-02-28 13:49:30] 7.9% of targets contain variants.
INFO [2019-02-28 13:49:30] Removing 42988 variants outside intervals.
INFO [2019-02-28 13:49:30] Found SOMATIC annotation in VCF.
INFO [2019-02-28 13:49:30] Setting somatic prior probabilities for somatic variants to 0.999000 or to 0.000100 otherwise.
INFO [2019-02-28 13:49:30] Found SOMATIC annotation in VCF. Setting mapping bias to 0.975.
INFO [2019-02-28 13:49:30] Excluding 90 novel or poor quality variants from segmentation.
INFO [2019-02-28 13:49:30] Sample sex: F
INFO [2019-02-28 13:49:30] Segmenting data...
INFO [2019-02-28 13:49:31] Loading pre-computed boundaries for DNAcopy...
INFO [2019-02-28 13:49:31] Setting undo.SD parameter to 0.750000.
INFO [2019-02-28 13:50:12] Setting undo.SD parameter to 1.125000.
Setting multi-figure configuration
INFO [2019-02-28 13:51:27] Setting prune.hclust.h parameter to 0.150000.
INFO [2019-02-28 13:51:28] Found 968 segments with median size of 0.21Mb.
INFO [2019-02-28 13:51:28] Removing 50 variants outside segments.
INFO [2019-02-28 13:51:28] Using 22573 variants.
INFO [2019-02-28 13:51:30] Mean standard deviation of log-ratios: 0.16
INFO [2019-02-28 13:51:30] 2D-grid search of purity and ploidy...
INFO [2019-02-28 13:52:16] Local optima: 0.9/3.8, 0.97/2, 0.93/5.8, 0.93/4.8, 0.97/3, 0.97/1,
0.9/1.2, 0.21/2
INFO [2019-02-28 13:52:16] Testing local optimum 1/8 at purity 0.90 and total ploidy 3.80...
INFO [2019-02-28 13:53:15] Testing local optimum 2/8 at purity 0.97 and total ploidy 2.00...
INFO [2019-02-28 13:53:59] Testing local optimum 3/8 at purity 0.93 and total ploidy 5.80...
INFO [2019-02-28 13:54:09] Recalibrating log-ratios...
INFO [2019-02-28 13:54:09] Testing local optimum 3/8 at purity 0.93 and total ploidy 5.80...
INFO [2019-02-28 13:54:19] Recalibrating log-ratios...
INFO [2019-02-28 13:54:19] Testing local optimum 3/8 at purity 0.93 and total ploidy 5.80...
INFO [2019-02-28 13:55:27] Recalibrating log-ratios...
INFO [2019-02-28 13:55:27] Testing local optimum 3/8 at purity 0.93 and total ploidy 5.80...
INFO [2019-02-28 13:56:17] Testing local optimum 4/8 at purity 0.93 and total ploidy 4.80...
INFO [2019-02-28 13:57:16] Testing local optimum 5/8 at purity 0.97 and total ploidy 3.00...
INFO [2019-02-28 13:58:28] Testing local optimum 6/8 at purity 0.97 and total ploidy 1.00...
INFO [2019-02-28 13:59:22] Testing local optimum 7/8 at purity 0.90 and total ploidy 1.20...
INFO [2019-02-28 14:00:44] Testing local optimum 8/8 at purity 0.21 and total ploidy 2.00...
FATAL [2019-02-28 14:00:52] Could not calculate copy number log-likelihood for purity NA and total
FATAL [2019-02-28 14:00:52] ploidy NA.
FATAL [2019-02-28 14:00:52]
FATAL [2019-02-28 14:00:52] This runtime error might be caused by invalid input data or parameters.
FATAL [2019-02-28 14:00:52] Please report bug (PureCN 1.13.19).
Error: Could not calculate copy number log-likelihood for purity NA and total
ploidy NA.
This runtime error might be caused by invalid input data or parameters.
Please report bug (PureCN 1.13.19).
In addition: Warning messages:
1: In max(px.rij.s) : no non-missing arguments to max; returning -Inf
2: In min(which(runif(n = 1, min = 0, max = sum(px.rij.s)) <= cumsum(px.rij.s))) :
no non-missing arguments to min; returning Inf
More info on the inputs:
tcov
is the output of correctCoverageBias
.ncov
is the output of calculateTangentNormal
(run on tcov
using normalDB
)normalDB
is the output of createNormalDatabase
using 3 normal samples (all I have so far, but will get a few more once they are done being sequenced)vfile
is the VCF output from Mutect 1.1.7snp.blacklist.file
is generated as suggested in the vignette using hg19_simpleRepeats.bed from UCSCmutect.stats.file
is the call_stats.txt output from Mutect 1.1.7interval.file
is the output of preprocessIntervals
with the recommended mappability and reptiming files from the vignette, along with setting min.target.width
to 100 as we discussed in the previous issue (#73).Note that I also set the test purity to the recommended setting for cell lines (0.90-0.99). If I change only that parameter to the default (0.15-0.90) then it runs without error.
Let me know if there's any more info that would be helpful to diagnose.
Thanks, I think this should be fixed now. Never had cell lines with matched normals, so this bug went undiscovered for years.
You get a warning "tumor.coverage.file and interval.file do not align", otherwise it looks good.
If you need help with the ploidy selection, feel free to send screenshots by email. The default parameters aren't optimized for cell lines and increased heterogeneity in high purity samples can often return a wrong high ploidy solution.
A tip for manual curation: look at all chromosome arms with balanced SNPs (around 0.5). A real high ploidy solution has often multiple balanced segments with different copy numbers (total copy number 2/ minor copy number 1, 4/2, 6/3), wrong solutions only one (4/2).
Thank you for addressing that so quickly. It seems to be working fine now with the higher test.purity
setting.
Yes, I guess it's not a common situation - here we have individual cell lines derived from tumor tissue, along with their matched normal tissue.
I'm not sure I completely understand your suggestion regarding looking at balanced SNPs and minor copy numbers. But I have seen some high ploidy solutions for some of my cell line and tumor samples. Once I finish the current run with the higher test.purity
settings for my cell lines, I'll send you an email with followup questions.
I appreciate all the help!
Best, Keegan
Hi Markus,
I'm getting the following error running
runAbsoluteCN
(only including seemingly relevant output):Any ideas why this might be? Happy to give more details about the files/parameters I'm using that throw the error if needed.
Best, Keegan