Closed andrewrech closed 6 years ago
Hi Andrew, thanks again, never seen this one before.
Can you share your mapping bias database as well? Or was there none?
One thing that is potentially confusing in the documentation: even if you have a matched normal, I would recommend NOT providing the normal coverage. If you do, then it won't use the pool of normal denoising.
I am not using a mapping bias database. So I should be running with normalDB set, but normal.coverage.file
== NULL?
Thanks a lot
PureCN::runAbsoluteCN(
normal.coverage.file = best_nl,
tumor.coverage.file = comdt[, tcov %>% .[i]],
vcf.file = comdt[, vcff %>% .[i]],
normalDB = ndb_obj,
genome = "hg38",
args.setMappingBiasVcf =
list(
vcf = vcfr,
normal.panel.vcf.file = comdt[, npvcf %>% .[i]]
),
args.filterVcf =
list(
vcf = vcfr,
use.somatic.status = TRUE,
interval.padding = comdt[, ip %>% .[i]],
remove.off.target = TRUE,
target.granges = bed,
min.base.quality = comdt[, mbq %>% .[i]]
),
fun.segmentation = PureCN::segmentationCBS,
args.segmentation = list(target.weight.file = comdt[, tw %>% .[i]]),
post.optimize = TRUE,
gc.gene.file = comdt[, gc_file %>% .[i]],
plot.cnv = FALSE,
verbose = TRUE,
test.purity = seq((comdt[, tp %>% .[i]]), 0.95, by = 0.01),
test.num.copy = 0:(comdt[, tc %>% .[i]]),
max.ploidy = comdt[, mp %>% .[i]],
speedup.heuristics = 1,
min.coverage = comdt[, mc %>% .[i]],
max.non.clonal = comdt[, mnc %>% .[i]],
max.pon = 5,
iterations = 60,
max.candidate.solutions = 50,
cosmic = comdt[, cosmic %>% .[i]],
log.file = comdt[, log_file %>% .[i]])
Ah, do you mean normal.panel.vcf.file
?
Yes, ideally use the createMappingBiasVcf function to parse this normal.panel.vcf.file into a better R data structure. See the Quick vignette.
Give it a try without the matched normal. And any reason you don't use off-target reads?
Is this amplicon data? With such a small panel, you really need off-target reads.
setMappingBiasVcf?
Ah, calculateMappingBiasVcf: https://github.com/lima1/PureCN/blob/master/R/calculateMappingBiasVcf.R
New in 1.10.0. You can use the resulting RDS instead of the VCF. Faster parsing and better imputation of mapping bias of novel mutations not in the database
Not amplicon data. This is simulated from WES over these intervals, hence no off target reads. After this testing, I will include on real world. Thanks.
I will use calculateMappingBiasVcf
and pass as normal.panel.vcf.file
. And to clarify, you think removing the normal coverage file is related to this bug or is that a general suggestion for improvement?
Thank you
Ah, sorry for the confusion. Looks like you use calculateTangentNormal (formerly findBestNormal) to create a normal, right? Not the matched normal? Then you are all good. I'll try to reproduce, likely unrelated.
correct :-)
great. thanks.
I found the docs quite clear.
<tangential code removed for clarity>
Sure that's the sample with the crash? That minimal example seems to work.
ndb <-readRDS( "~/Downloads/95c65fd1-f28c-4ed0-9c3e-b39b14991512_Homo_sapiens_assembly38_FFPEv2_Hg38_normalDB.RDS")
tumor <- readCoverageFile("~/Downloads/e886eb16-dae4-4329-8b65-c6d018017150_Homo_sapiens_assembly38_FFPEv2_Hg38_10c3a27f-484a-4367-b288-2ae95ad75143_TN_M_R_xE_T_dna_rg_br_loess.txt")
normal <- calculateTangentNormal(normalDB=ndb, tumor.coverage.file=tumor)
vcf.file <- "~/Downloads/e2c99141-95e0-40ac-93b7-f2016b4b7437_12-211_T.sorted_sortn_dna_md_rg_br.vcf"
runAbsoluteCN(normal.coverage.file=normal,
tumor=tumor,
normalDB=ndb,
vcf=vcf.file,
args.setMappingBiasVcf=list(normal.panel.vcf.file="~/Downloads/97c3d3aa-c8fe-4b14-b941-058c5df43cfa_artfdet_comb5.vcf.gz"),
genome="hg38",
max.pon=5,
args.segmentation = list(target.weight.file = "~/Downloads/622da328-9e65-4c7d-a0c8-918bed0648f2_Homo_sapiens_assembly38_FFPEv2_Hg38_tw.txt"))
I cannot reproduce your code on my system. Tomorrow I will boot a fresh EC2 instance and reinstall packages, perhaps something strange and low level is occurring.
Thanks for trying and for the advice.
Pretty sure a bug, but looking at the code I cannot figure out when this happens. The sample id is different from your logfile:
INFO [2018-05-08 23:41:25] ------------------------------------------------------------
INFO [2018-05-08 23:41:25] PureCN 1.11.1
INFO [2018-05-08 23:41:25] ------------------------------------------------------------
INFO [2018-05-08 23:41:25] Arguments: -vcf.file ~/Downloads/e2c99141-95e0-40ac-93b7-f2016b4b7437_12-211_T.sorted_sortn_dna_md_rg_br.vcf -genome hg38 -args.setMappingBiasVcf ~/Downloads/97c3d3aa-c8fe-4b14-b941-058c5df43cfa_artfdet_comb5.vcf.gz -args.segmentation ~/Downloads/622da328-9e65-4c7d-a0c8-918bed0648f2_Homo_sapiens_assembly38_FFPEv2_Hg38_tw.txt -max.pon 5 -normal.coverage.file <data> -tumor.coverage.file <data> -normalDB <data>
INFO [2018-05-08 23:41:25] Loading coverage files...
INFO [2018-05-08 23:41:25] Mean target coverages: 397X (tumor) 396X (normal).
WARN [2018-05-08 23:41:25] Allosome coverage missing, cannot determine sex.
WARN [2018-05-08 23:41:25] Allosome coverage missing, cannot determine sex.
INFO [2018-05-08 23:41:25] Removing 14 intervals with missing log.ratio.
INFO [2018-05-08 23:41:25] Removing 15 targets excluded in normalDB.
INFO [2018-05-08 23:41:25] normalDB provided. Setting minimum coverage for segmentation to 0.0015X.
INFO [2018-05-08 23:41:25] Removing 232 low coverage (< 0.0015X) targets.
INFO [2018-05-08 23:41:25] Using 2574 intervals (2574 on-target, 0 off-target).
INFO [2018-05-08 23:41:25] No off-target intervals. If this is hybrid-capture data, consider adding them.
INFO [2018-05-08 23:41:25] No interval.file provided. Cannot check if data was GC-normalized. Was it?
INFO [2018-05-08 23:41:25] Loading VCF...
INFO [2018-05-08 23:41:46] Found 1166483 variants in VCF file.
INFO [2018-05-08 23:41:47] 444116 (38.1%) variants annotated as likely germline (DB INFO flag).
INFO [2018-05-08 23:41:50] 12-211_T.sorted_sortn_dna_md is tumor in VCF file.
INFO [2018-05-08 23:41:53] 524 homozygous and 3430 heterozygous variants on chrX.
INFO [2018-05-08 23:41:53] Sex from VCF: F (Fisher's p-value: < 0.0001, odds-ratio: 0.78).
INFO [2018-05-08 23:41:57] Removing 1004228 non heterozygous (in matched normal) germline SNPs.
INFO [2018-05-08 23:42:06] Initial testing for significant sample cross-contamination: unlikely
INFO [2018-05-08 23:42:06] Removing 112876 variants with AF < 0.030 or AF >= 1.000 or less than 3 supporting reads or depth < 15.
INFO [2018-05-08 23:42:06] Removing 25595 low quality variants with BQ < 25.
INFO [2018-05-08 23:42:07] Total size of targeted genomic region: 0.45Mb (0.69Mb with 50bp padding).
INFO [2018-05-08 23:42:07] 4.0% of targets contain variants.
INFO [2018-05-08 23:42:07] Removing 23683 variants outside intervals.
INFO [2018-05-08 23:42:07] Found SOMATIC annotation in VCF.
INFO [2018-05-08 23:42:07] Setting somatic prior probabilities for somatic variants to 0.999000 or to 0.000100 otherwise.
INFO [2018-05-08 23:42:07] Found SOMATIC annotation in VCF. Setting mapping bias to 0.956.
INFO [2018-05-08 23:42:07] Scanning ~/Downloads/97c3d3aa-c8fe-4b14-b941-058c5df43cfa_artfdet_comb5.vcf.gz...
INFO [2018-05-08 23:42:09] Imputing mapping bias for 1 variants...
INFO [2018-05-08 23:42:09] Sample sex: ?
INFO [2018-05-08 23:42:09] Segmenting data...
INFO [2018-05-08 23:42:09] Target weights found, will use weighted CBS.
INFO [2018-05-08 23:42:09] Loading pre-computed boundaries for DNAcopy...
INFO [2018-05-08 23:42:09] Setting undo.SD parameter to 0.500000.
Setting multi-figure configuration
INFO [2018-05-08 23:42:11] Setting prune.hclust.h parameter to 0.100000.
INFO [2018-05-08 23:42:11] Found 26 segments with median size of 62.80Mb.
INFO [2018-05-08 23:42:11] Using 101 variants.
INFO [2018-05-08 23:42:11] Mean standard deviation of log-ratios: 0.15
INFO [2018-05-08 23:42:11] 2D-grid search of purity and ploidy...
INFO [2018-05-08 23:42:15] Local optima: 0.15/2.6, 0.15/2.2, 0.19/2.6, 0.15/1.9, 0.17/2, 0.58/4.6,
0.72/5.2, 0.27/2, 0.78/5, 0.75/4.2, 0.62/3.2, 0.35/1.6, 0.62/2.6,
0.52/2, 0.82/2.8, 0.78/2, 0.58/1.4, 0.78/1.2, 0.95/3, 0.95/1
INFO [2018-05-08 23:42:15] Testing local optimum 1/20 at purity 0.15 and total ploidy 2.60...
INFO [2018-05-08 23:42:15] Fitting variants for purity 0.15, tumor ploidy 5.99 and contamination 0.01.
INFO [2018-05-08 23:42:16] Optimized purity: 0.15
INFO [2018-05-08 23:42:16] Testing local optimum 2/20 at purity 0.15 and total ploidy 2.20...
INFO [2018-05-08 23:42:17] Fitting variants for purity 0.15, tumor ploidy 3.06 and contamination 0.01.
INFO [2018-05-08 23:42:17] Optimized purity: 0.15
INFO [2018-05-08 23:42:17] Testing local optimum 3/20 at purity 0.19 and total ploidy 2.60...
INFO [2018-05-08 23:42:18] Fitting variants for purity 0.19, tumor ploidy 5.12 and contamination 0.01.
INFO [2018-05-08 23:42:19] Optimized purity: 0.19
INFO [2018-05-08 23:42:19] Testing local optimum 4/20 at purity 0.15 and total ploidy 1.90...
INFO [2018-05-08 23:42:19] Recalibrating log-ratios...
INFO [2018-05-08 23:42:19] Testing local optimum 4/20 at purity 0.15 and total ploidy 1.90...
INFO [2018-05-08 23:42:19] Recalibrating log-ratios...
INFO [2018-05-08 23:42:19] Testing local optimum 4/20 at purity 0.15 and total ploidy 1.90...
INFO [2018-05-08 23:42:19] Recalibrating log-ratios...
INFO [2018-05-08 23:42:19] Testing local optimum 4/20 at purity 0.15 and total ploidy 1.90...
INFO [2018-05-08 23:42:19] Testing local optimum 5/20 at purity 0.17 and total ploidy 2.00...
INFO [2018-05-08 23:42:19] Fitting variants for purity 0.15, tumor ploidy 2.11 and contamination 0.01.
INFO [2018-05-08 23:42:20] Optimized purity: 0.15
INFO [2018-05-08 23:42:20] Testing local optimum 6/20 at purity 0.58 and total ploidy 4.60...
INFO [2018-05-08 23:42:20] Recalibrating log-ratios...
INFO [2018-05-08 23:42:20] Testing local optimum 6/20 at purity 0.58 and total ploidy 4.60...
INFO [2018-05-08 23:42:20] Recalibrating log-ratios...
INFO [2018-05-08 23:42:20] Testing local optimum 6/20 at purity 0.58 and total ploidy 4.60...
INFO [2018-05-08 23:42:21] Recalibrating log-ratios...
INFO [2018-05-08 23:42:21] Testing local optimum 6/20 at purity 0.58 and total ploidy 4.60...
INFO [2018-05-08 23:42:21] Testing local optimum 7/20 at purity 0.72 and total ploidy 5.20...
INFO [2018-05-08 23:42:21] Recalibrating log-ratios...
INFO [2018-05-08 23:42:21] Testing local optimum 7/20 at purity 0.72 and total ploidy 5.20...
INFO [2018-05-08 23:42:21] Recalibrating log-ratios...
INFO [2018-05-08 23:42:21] Testing local optimum 7/20 at purity 0.72 and total ploidy 5.20...
INFO [2018-05-08 23:42:21] Recalibrating log-ratios...
INFO [2018-05-08 23:42:21] Testing local optimum 7/20 at purity 0.72 and total ploidy 5.20...
INFO [2018-05-08 23:42:21] Testing local optimum 8/20 at purity 0.27 and total ploidy 2.00...
INFO [2018-05-08 23:42:22] Fitting variants for purity 0.19, tumor ploidy 1.42 and contamination 0.01.
INFO [2018-05-08 23:42:22] Optimized purity: 0.19
INFO [2018-05-08 23:42:22] Testing local optimum 9/20 at purity 0.78 and total ploidy 5.00...
INFO [2018-05-08 23:42:23] Fitting variants for purity 0.77, tumor ploidy 5.80 and contamination 0.01.
INFO [2018-05-08 23:42:24] Optimized purity: 0.77
INFO [2018-05-08 23:42:24] Testing local optimum 10/20 at purity 0.75 and total ploidy 4.20...
INFO [2018-05-08 23:42:24] Fitting variants for purity 0.71, tumor ploidy 4.87 and contamination 0.01.
INFO [2018-05-08 23:42:25] Optimized purity: 0.71
INFO [2018-05-08 23:42:25] Testing local optimum 11/20 at purity 0.62 and total ploidy 3.20...
INFO [2018-05-08 23:42:25] Fitting variants for purity 0.55, tumor ploidy 3.87 and contamination 0.01.
INFO [2018-05-08 23:42:26] Optimized purity: 0.55
INFO [2018-05-08 23:42:26] Testing local optimum 12/20 at purity 0.35 and total ploidy 1.60...
INFO [2018-05-08 23:42:26] Recalibrating log-ratios...
INFO [2018-05-08 23:42:26] Testing local optimum 12/20 at purity 0.35 and total ploidy 1.60...
INFO [2018-05-08 23:42:26] Recalibrating log-ratios...
INFO [2018-05-08 23:42:26] Testing local optimum 12/20 at purity 0.35 and total ploidy 1.60...
INFO [2018-05-08 23:42:27] Recalibrating log-ratios...
INFO [2018-05-08 23:42:27] Testing local optimum 12/20 at purity 0.35 and total ploidy 1.60...
INFO [2018-05-08 23:42:27] Testing local optimum 13/20 at purity 0.62 and total ploidy 2.60...
INFO [2018-05-08 23:42:27] Fitting variants for purity 0.66, tumor ploidy 3.00 and contamination 0.01.
INFO [2018-05-08 23:42:28] Optimized purity: 0.66
INFO [2018-05-08 23:42:28] Testing local optimum 14/20 at purity 0.52 and total ploidy 2.00...
INFO [2018-05-08 23:42:28] Fitting variants for purity 0.32, tumor ploidy 1.88 and contamination 0.01.
INFO [2018-05-08 23:42:29] Optimized purity: 0.32
INFO [2018-05-08 23:42:29] Testing local optimum 15/20 at purity 0.82 and total ploidy 2.80...
INFO [2018-05-08 23:42:29] Fitting variants for purity 0.86, tumor ploidy 3.00 and contamination 0.01.
INFO [2018-05-08 23:42:30] Optimized purity: 0.86
INFO [2018-05-08 23:42:30] Testing local optimum 16/20 at purity 0.78 and total ploidy 2.00...
INFO [2018-05-08 23:42:31] Fitting variants for purity 0.59, tumor ploidy 2.00 and contamination 0.01.
INFO [2018-05-08 23:42:31] Optimized purity: 0.59
INFO [2018-05-08 23:42:31] Testing local optimum 17/20 at purity 0.58 and total ploidy 1.40...
INFO [2018-05-08 23:42:32] Fitting variants for purity 0.56, tumor ploidy 1.00 and contamination 0.01.
INFO [2018-05-08 23:42:33] Optimized purity: 0.56
INFO [2018-05-08 23:42:33] Testing local optimum 18/20 at purity 0.78 and total ploidy 1.20...
INFO [2018-05-08 23:42:33] Fitting variants for purity 0.75, tumor ploidy 1.00 and contamination 0.01.
INFO [2018-05-08 23:42:34] Optimized purity: 0.75
INFO [2018-05-08 23:42:34] Testing local optimum 19/20 at purity 0.95 and total ploidy 3.00...
INFO [2018-05-08 23:42:34] Fitting variants for purity 0.87, tumor ploidy 3.00 and contamination 0.01.
INFO [2018-05-08 23:42:35] Optimized purity: 0.87
INFO [2018-05-08 23:42:35] Testing local optimum 20/20 at purity 0.95 and total ploidy 1.00...
INFO [2018-05-08 23:42:36] Fitting variants for purity 0.91, tumor ploidy 1.00 and contamination 0.01.
INFO [2018-05-08 23:42:36] Optimized purity: 0.91
INFO [2018-05-08 23:42:36] Done.
INFO [2018-05-08 23:42:36] ------------------------------------------------------------
Thanks. A uuid is added to the S3 upload, it is the same sample 10c3a27f-484a-4367-b288-2ae95ad7
...
But the numbers don’t align either, like coverage, number of variants in the Vcf.
Got it - you are right, of course. Same file but different parameters. I have made some progress on identifying the inputs that are causing this error. I will add a solid MRE asap.
Thanks
I am not able to consistently reproduce this bug. Samples with intervals totalling less than ~3 Mb fail about 10% of the time, randomly. It seems this is likely related to my system or a core Bioc package.
You can set a seed with set.seed(123) for example. This should make it reproducible.
It's likely a corner case that happens due to the extreme downsampling. Feel free to reopen when you can reproduce.
I think this might happen when there are no variants on a chromosome to impute the mapping bias. Try using the calculateMappingBiasVcf function to generate the RDS file. That should fix it.
See #29 for hardware and session info.
Note that this file is aggressively downsampled for testing purposes, which may be contributing to the error. More than 50 other identically pipelined and processed samples completed successfully. This is one of three files that generated errors, all with the same message as indicated below.
Thank you very much for your help, as always.
ERROR
LOGGING