lima1 / PureCN

Copy number calling and variant classification using targeted short read sequencing
https://bioconductor.org/packages/devel/bioc/html/PureCN.html
Artistic License 2.0
127 stars 32 forks source link

Error when fitting variants (!sum(mapping.bias.ok)) #30

Closed andrewrech closed 6 years ago

andrewrech commented 6 years ago

See #29 for hardware and session info.

Note that this file is aggressively downsampled for testing purposes, which may be contributing to the error. More than 50 other identically pipelined and processed samples completed successfully. This is one of three files that generated errors, all with the same message as indicated below.

Thank you very much for your help, as always.

ERROR

Error in if (!sum(mapping.bias.ok)) { :
  missing value where TRUE/FALSE needed

LOGGING

WARN [2018-05-08 16:16:44] gc.gene.file was renamed to interval.file.
INFO [2018-05-08 16:16:44] ------------------------------------------------------------
INFO [2018-05-08 16:16:44] PureCN 1.11.1
INFO [2018-05-08 16:16:44] ------------------------------------------------------------
INFO [2018-05-08 16:16:53] Arguments: -tumor.coverage.file Homo_sapiens_assembly38_FFPEv2_Hg38_10c3a27f-484a-4367-b288-2ae95ad75143_TN_M_R_xE_T_dna_rg_br_loess.txt -vcf.file 12-211_T.sorted_sortn_dna_md_rg_br.vcf -genome hg38 -args.segmentation Homo_sapiens_assembly38_FFPEv2_Hg38_tw.txt -max.candidate.solutions 50 -max.non.clonal 0.1 -max.pon 5 -iterations 60 -gc.gene.file Homo_sapiens_assembly38_FFPEv2_Hg38_gc_file.txt -plot.cnv FALSE -cosmic.vcf.file /mnt/ephm/refs//CosmicCodingMuts.vcf.gz -post.optimize TRUE -speedup.heuristics 1 -log.file H_38_FFPE2_H38_2635-69-4552-98-1040040_TN_M_R_E_T_mc-15.ip-50.mbq-25.tp-015.tc-7.mp-6.mnc-01.pcn.log -verbose TRUE -normal.coverage.file <data> -normalDB <data> -args.filterVcf <data> -args.setMappingBiasVcf <data> -fun.segmentation <data> -max.ploidy <data> -test.num.copy <data> -test.purity <data> -min.coverage <data>
INFO [2018-05-08 16:16:53] Loading coverage files...
INFO [2018-05-08 16:16:53] Mean target coverages: 174X (tumor) 172X (normal).
WARN [2018-05-08 16:16:53] Allosome coverage missing, cannot determine sex.
WARN [2018-05-08 16:16:53] Allosome coverage missing, cannot determine sex.
INFO [2018-05-08 16:16:54] Removing 13 intervals with missing log.ratio.
INFO [2018-05-08 16:16:54] Removing 16 targets excluded in normalDB.
INFO [2018-05-08 16:16:54] normalDB provided. Setting minimum coverage for segmentation to 0.0015X.
INFO [2018-05-08 16:16:54] Removing 232 low coverage (< 0.0015X) targets.
INFO [2018-05-08 16:16:54] Using 2574 intervals (2574 on-target, 0 off-target).
INFO [2018-05-08 16:16:54] No off-target intervals. If this is hybrid-capture data, consider adding them.
INFO [2018-05-08 16:16:54] AT/GC dropout: 1.01 (tumor), 1.01 (normal).
INFO [2018-05-08 16:16:54] Loading VCF...
INFO [2018-05-08 16:17:51] Found 3140875 variants in VCF file.
INFO [2018-05-08 16:17:58] 740586 (23.6%) variants annotated as likely germline (DB INFO flag).
INFO [2018-05-08 16:18:07] 36747998-e48f-436f-a372-827f882cf91b_TN_M_R_xE_T_dna is tumor in VCF file.
INFO [2018-05-08 16:18:19] 928 homozygous and 10440 heterozygous variants on chrX.
INFO [2018-05-08 16:18:19] Sex from VCF: F (Fisher's p-value: < 0.0001, odds-ratio: 1.93).
WARN [2018-05-08 16:18:19] Duplicated arguments in filterVcf
INFO [2018-05-08 16:18:31] Removing 2966776 non heterozygous (in matched normal) germline SNPs.
INFO [2018-05-08 16:18:45] Initial testing for significant sample cross-contamination: maybe
INFO [2018-05-08 16:18:46] Removing 109928 variants with AF < 0.030 or AF >= 1.000 or less than 3 supporting reads or depth < 15.
INFO [2018-05-08 16:18:46] Removing 33977 low quality variants with BQ < 25.
INFO [2018-05-08 16:18:46] Total size of targeted genomic region: 0.45Mb (0.69Mb with 50bp padding).
INFO [2018-05-08 16:18:47] 8.3% of targets contain variants.
INFO [2018-05-08 16:18:47] Removing 29971 variants outside intervals.
INFO [2018-05-08 16:18:47] Reading COSMIC VCF...
INFO [2018-05-08 16:18:49] Found SOMATIC annotation in VCF.
INFO [2018-05-08 16:18:49] Setting somatic prior probabilities for somatic variants to 0.999000 or to 0.000100 otherwise.
INFO [2018-05-08 16:18:49] Found SOMATIC annotation in VCF. Setting mapping bias to 1.021.
INFO [2018-05-08 16:18:49] Scanning artfdet_comb5.vcf.gz...
INFO [2018-05-08 16:18:52] Imputing mapping bias for 97 variants...
INFO [2018-05-08 16:18:55] Sample sex: ?
INFO [2018-05-08 16:18:55] Segmenting data...
INFO [2018-05-08 16:18:55] Target weights found, will use weighted CBS.
INFO [2018-05-08 16:18:55] Loading pre-computed boundaries for DNAcopy...
INFO [2018-05-08 16:18:55] Setting undo.SD parameter to 0.500000.
INFO [2018-05-08 16:18:55] Setting prune.hclust.h parameter to 0.100000.
INFO [2018-05-08 16:18:55] Found 25 segments with median size of 68.28Mb.
INFO [2018-05-08 16:18:55] Using 223 variants.
INFO [2018-05-08 16:18:55] Mean standard deviation of log-ratios: 0.17
INFO [2018-05-08 16:18:55] 2D-grid search of purity and ploidy...
INFO [2018-05-08 16:19:00] Local optima: 0.17/2.8, 0.15/2.4, 0.17/2, 0.29/3.2, 0.19/1.8, 0.48/4.4, 0.45/3.8, 0.31/2.6, 0.33/3, 0.68/5.4, 0.55/4.2, 0.82/6, 0.38/2.4, 0.42/2.8, 0.48/2, 0.78/2.8,
0.58/2.6, 0.52/3, 0.42/1.6, 0.62/1.4, 0.82/1.2, 0.95/3, 0.95/1
INFO [2018-05-08 16:19:00] Testing local optimum 1/23 at purity 0.17 and total ploidy 2.80...
INFO [2018-05-08 16:19:00] Recalibrating log-ratios...
INFO [2018-05-08 16:19:00] Testing local optimum 1/23 at purity 0.17 and total ploidy 2.80...
INFO [2018-05-08 16:19:01] Recalibrating log-ratios...
INFO [2018-05-08 16:19:01] Testing local optimum 1/23 at purity 0.17 and total ploidy 2.80...
INFO [2018-05-08 16:19:01] Recalibrating log-ratios...
INFO [2018-05-08 16:19:01] Testing local optimum 1/23 at purity 0.17 and total ploidy 2.80...
INFO [2018-05-08 16:19:01] Testing local optimum 2/23 at purity 0.15 and total ploidy 2.40...
INFO [2018-05-08 16:19:01] Fitting variants for purity 0.15, tumor ploidy 4.62 and contamination 0.01.
lima1 commented 6 years ago

Hi Andrew, thanks again, never seen this one before.

Can you share your mapping bias database as well? Or was there none?

One thing that is potentially confusing in the documentation: even if you have a matched normal, I would recommend NOT providing the normal coverage. If you do, then it won't use the pool of normal denoising.

andrewrech commented 6 years ago

I am not using a mapping bias database. So I should be running with normalDB set, but normal.coverage.file == NULL?

Thanks a lot

PureCN::runAbsoluteCN(
normal.coverage.file = best_nl,
tumor.coverage.file = comdt[, tcov %>% .[i]],
vcf.file = comdt[, vcff %>% .[i]],
normalDB = ndb_obj,
genome = "hg38",
args.setMappingBiasVcf =
list(
vcf = vcfr,
normal.panel.vcf.file = comdt[, npvcf %>% .[i]]
),
args.filterVcf =
list(
vcf = vcfr,
use.somatic.status = TRUE,
interval.padding = comdt[, ip %>% .[i]],
remove.off.target = TRUE,
target.granges = bed,
min.base.quality = comdt[, mbq %>% .[i]]
),
fun.segmentation = PureCN::segmentationCBS,
args.segmentation = list(target.weight.file = comdt[, tw %>% .[i]]),
post.optimize = TRUE,
gc.gene.file = comdt[, gc_file %>% .[i]],
plot.cnv = FALSE,
verbose = TRUE,
test.purity = seq((comdt[, tp %>% .[i]]), 0.95, by = 0.01),
test.num.copy = 0:(comdt[, tc %>% .[i]]),
max.ploidy = comdt[, mp %>% .[i]],
speedup.heuristics = 1,
min.coverage = comdt[, mc %>% .[i]],
max.non.clonal = comdt[, mnc %>% .[i]],
max.pon = 5,
iterations = 60,
max.candidate.solutions = 50,
cosmic = comdt[, cosmic %>% .[i]],
log.file = comdt[, log_file %>% .[i]])
andrewrech commented 6 years ago

Ah, do you mean normal.panel.vcf.file?

andrewrech commented 6 years ago
lima1 commented 6 years ago

Yes, ideally use the createMappingBiasVcf function to parse this normal.panel.vcf.file into a better R data structure. See the Quick vignette.

Give it a try without the matched normal. And any reason you don't use off-target reads?

Is this amplicon data? With such a small panel, you really need off-target reads.

andrewrech commented 6 years ago

setMappingBiasVcf?

lima1 commented 6 years ago

Ah, calculateMappingBiasVcf: https://github.com/lima1/PureCN/blob/master/R/calculateMappingBiasVcf.R

New in 1.10.0. You can use the resulting RDS instead of the VCF. Faster parsing and better imputation of mapping bias of novel mutations not in the database

andrewrech commented 6 years ago

Not amplicon data. This is simulated from WES over these intervals, hence no off target reads. After this testing, I will include on real world. Thanks.

I will use calculateMappingBiasVcf and pass as normal.panel.vcf.file. And to clarify, you think removing the normal coverage file is related to this bug or is that a general suggestion for improvement?

Thank you

lima1 commented 6 years ago

Ah, sorry for the confusion. Looks like you use calculateTangentNormal (formerly findBestNormal) to create a normal, right? Not the matched normal? Then you are all good. I'll try to reproduce, likely unrelated.

andrewrech commented 6 years ago

correct :-)

great. thanks.

I found the docs quite clear.

andrewrech commented 6 years ago

<tangential code removed for clarity>

lima1 commented 6 years ago

Sure that's the sample with the crash? That minimal example seems to work.

ndb <-readRDS( "~/Downloads/95c65fd1-f28c-4ed0-9c3e-b39b14991512_Homo_sapiens_assembly38_FFPEv2_Hg38_normalDB.RDS")
tumor <- readCoverageFile("~/Downloads/e886eb16-dae4-4329-8b65-c6d018017150_Homo_sapiens_assembly38_FFPEv2_Hg38_10c3a27f-484a-4367-b288-2ae95ad75143_TN_M_R_xE_T_dna_rg_br_loess.txt")
normal <- calculateTangentNormal(normalDB=ndb, tumor.coverage.file=tumor)
vcf.file <- "~/Downloads/e2c99141-95e0-40ac-93b7-f2016b4b7437_12-211_T.sorted_sortn_dna_md_rg_br.vcf"
runAbsoluteCN(normal.coverage.file=normal, 
  tumor=tumor,  
  normalDB=ndb, 
  vcf=vcf.file, 
  args.setMappingBiasVcf=list(normal.panel.vcf.file="~/Downloads/97c3d3aa-c8fe-4b14-b941-058c5df43cfa_artfdet_comb5.vcf.gz"), 
  genome="hg38", 
  max.pon=5, 
  args.segmentation = list(target.weight.file = "~/Downloads/622da328-9e65-4c7d-a0c8-918bed0648f2_Homo_sapiens_assembly38_FFPEv2_Hg38_tw.txt"))
andrewrech commented 6 years ago

I cannot reproduce your code on my system. Tomorrow I will boot a fresh EC2 instance and reinstall packages, perhaps something strange and low level is occurring.

Thanks for trying and for the advice.

lima1 commented 6 years ago

Pretty sure a bug, but looking at the code I cannot figure out when this happens. The sample id is different from your logfile:

INFO [2018-05-08 23:41:25] ------------------------------------------------------------
INFO [2018-05-08 23:41:25] PureCN 1.11.1
INFO [2018-05-08 23:41:25] ------------------------------------------------------------
INFO [2018-05-08 23:41:25] Arguments: -vcf.file ~/Downloads/e2c99141-95e0-40ac-93b7-f2016b4b7437_12-211_T.sorted_sortn_dna_md_rg_br.vcf -genome hg38 -args.setMappingBiasVcf ~/Downloads/97c3d3aa-c8fe-4b14-b941-058c5df43cfa_artfdet_comb5.vcf.gz -args.segmentation ~/Downloads/622da328-9e65-4c7d-a0c8-918bed0648f2_Homo_sapiens_assembly38_FFPEv2_Hg38_tw.txt -max.pon 5 -normal.coverage.file <data> -tumor.coverage.file <data> -normalDB <data>
INFO [2018-05-08 23:41:25] Loading coverage files...
INFO [2018-05-08 23:41:25] Mean target coverages: 397X (tumor) 396X (normal).
WARN [2018-05-08 23:41:25] Allosome coverage missing, cannot determine sex.
WARN [2018-05-08 23:41:25] Allosome coverage missing, cannot determine sex.
INFO [2018-05-08 23:41:25] Removing 14 intervals with missing log.ratio.
INFO [2018-05-08 23:41:25] Removing 15 targets excluded in normalDB.
INFO [2018-05-08 23:41:25] normalDB provided. Setting minimum coverage for segmentation to 0.0015X.
INFO [2018-05-08 23:41:25] Removing 232 low coverage (< 0.0015X) targets.
INFO [2018-05-08 23:41:25] Using 2574 intervals (2574 on-target, 0 off-target).
INFO [2018-05-08 23:41:25] No off-target intervals. If this is hybrid-capture data, consider adding them.
INFO [2018-05-08 23:41:25] No interval.file provided. Cannot check if data was GC-normalized. Was it?
INFO [2018-05-08 23:41:25] Loading VCF...
INFO [2018-05-08 23:41:46] Found 1166483 variants in VCF file.
INFO [2018-05-08 23:41:47] 444116 (38.1%) variants annotated as likely germline (DB INFO flag).
INFO [2018-05-08 23:41:50] 12-211_T.sorted_sortn_dna_md is tumor in VCF file.
INFO [2018-05-08 23:41:53] 524 homozygous and 3430 heterozygous variants on chrX.
INFO [2018-05-08 23:41:53] Sex from VCF: F (Fisher's p-value: < 0.0001, odds-ratio: 0.78).
INFO [2018-05-08 23:41:57] Removing 1004228 non heterozygous (in matched normal) germline SNPs.
INFO [2018-05-08 23:42:06] Initial testing for significant sample cross-contamination: unlikely
INFO [2018-05-08 23:42:06] Removing 112876 variants with AF < 0.030 or AF >= 1.000 or less than 3 supporting reads or depth < 15.
INFO [2018-05-08 23:42:06] Removing 25595 low quality variants with BQ < 25.
INFO [2018-05-08 23:42:07] Total size of targeted genomic region: 0.45Mb (0.69Mb with 50bp padding).
INFO [2018-05-08 23:42:07] 4.0% of targets contain variants.
INFO [2018-05-08 23:42:07] Removing 23683 variants outside intervals.
INFO [2018-05-08 23:42:07] Found SOMATIC annotation in VCF.
INFO [2018-05-08 23:42:07] Setting somatic prior probabilities for somatic variants to 0.999000 or to 0.000100 otherwise.
INFO [2018-05-08 23:42:07] Found SOMATIC annotation in VCF. Setting mapping bias to 0.956.
INFO [2018-05-08 23:42:07] Scanning ~/Downloads/97c3d3aa-c8fe-4b14-b941-058c5df43cfa_artfdet_comb5.vcf.gz...
INFO [2018-05-08 23:42:09] Imputing mapping bias for 1 variants...
INFO [2018-05-08 23:42:09] Sample sex: ?
INFO [2018-05-08 23:42:09] Segmenting data...
INFO [2018-05-08 23:42:09] Target weights found, will use weighted CBS.
INFO [2018-05-08 23:42:09] Loading pre-computed boundaries for DNAcopy...
INFO [2018-05-08 23:42:09] Setting undo.SD parameter to 0.500000.
Setting multi-figure configuration
INFO [2018-05-08 23:42:11] Setting prune.hclust.h parameter to 0.100000.
INFO [2018-05-08 23:42:11] Found 26 segments with median size of 62.80Mb.
INFO [2018-05-08 23:42:11] Using 101 variants.
INFO [2018-05-08 23:42:11] Mean standard deviation of log-ratios: 0.15
INFO [2018-05-08 23:42:11] 2D-grid search of purity and ploidy...
INFO [2018-05-08 23:42:15] Local optima: 0.15/2.6, 0.15/2.2, 0.19/2.6, 0.15/1.9, 0.17/2, 0.58/4.6,
0.72/5.2, 0.27/2, 0.78/5, 0.75/4.2, 0.62/3.2, 0.35/1.6, 0.62/2.6,
0.52/2, 0.82/2.8, 0.78/2, 0.58/1.4, 0.78/1.2, 0.95/3, 0.95/1
INFO [2018-05-08 23:42:15] Testing local optimum 1/20 at purity 0.15 and total ploidy 2.60...
INFO [2018-05-08 23:42:15] Fitting variants for purity 0.15, tumor ploidy 5.99 and contamination 0.01.
INFO [2018-05-08 23:42:16] Optimized purity: 0.15
INFO [2018-05-08 23:42:16] Testing local optimum 2/20 at purity 0.15 and total ploidy 2.20...
INFO [2018-05-08 23:42:17] Fitting variants for purity 0.15, tumor ploidy 3.06 and contamination 0.01.
INFO [2018-05-08 23:42:17] Optimized purity: 0.15
INFO [2018-05-08 23:42:17] Testing local optimum 3/20 at purity 0.19 and total ploidy 2.60...
INFO [2018-05-08 23:42:18] Fitting variants for purity 0.19, tumor ploidy 5.12 and contamination 0.01.
INFO [2018-05-08 23:42:19] Optimized purity: 0.19
INFO [2018-05-08 23:42:19] Testing local optimum 4/20 at purity 0.15 and total ploidy 1.90...
INFO [2018-05-08 23:42:19] Recalibrating log-ratios...
INFO [2018-05-08 23:42:19] Testing local optimum 4/20 at purity 0.15 and total ploidy 1.90...
INFO [2018-05-08 23:42:19] Recalibrating log-ratios...
INFO [2018-05-08 23:42:19] Testing local optimum 4/20 at purity 0.15 and total ploidy 1.90...
INFO [2018-05-08 23:42:19] Recalibrating log-ratios...
INFO [2018-05-08 23:42:19] Testing local optimum 4/20 at purity 0.15 and total ploidy 1.90...
INFO [2018-05-08 23:42:19] Testing local optimum 5/20 at purity 0.17 and total ploidy 2.00...
INFO [2018-05-08 23:42:19] Fitting variants for purity 0.15, tumor ploidy 2.11 and contamination 0.01.
INFO [2018-05-08 23:42:20] Optimized purity: 0.15
INFO [2018-05-08 23:42:20] Testing local optimum 6/20 at purity 0.58 and total ploidy 4.60...
INFO [2018-05-08 23:42:20] Recalibrating log-ratios...
INFO [2018-05-08 23:42:20] Testing local optimum 6/20 at purity 0.58 and total ploidy 4.60...
INFO [2018-05-08 23:42:20] Recalibrating log-ratios...
INFO [2018-05-08 23:42:20] Testing local optimum 6/20 at purity 0.58 and total ploidy 4.60...
INFO [2018-05-08 23:42:21] Recalibrating log-ratios...
INFO [2018-05-08 23:42:21] Testing local optimum 6/20 at purity 0.58 and total ploidy 4.60...
INFO [2018-05-08 23:42:21] Testing local optimum 7/20 at purity 0.72 and total ploidy 5.20...
INFO [2018-05-08 23:42:21] Recalibrating log-ratios...
INFO [2018-05-08 23:42:21] Testing local optimum 7/20 at purity 0.72 and total ploidy 5.20...
INFO [2018-05-08 23:42:21] Recalibrating log-ratios...
INFO [2018-05-08 23:42:21] Testing local optimum 7/20 at purity 0.72 and total ploidy 5.20...
INFO [2018-05-08 23:42:21] Recalibrating log-ratios...
INFO [2018-05-08 23:42:21] Testing local optimum 7/20 at purity 0.72 and total ploidy 5.20...
INFO [2018-05-08 23:42:21] Testing local optimum 8/20 at purity 0.27 and total ploidy 2.00...
INFO [2018-05-08 23:42:22] Fitting variants for purity 0.19, tumor ploidy 1.42 and contamination 0.01.
INFO [2018-05-08 23:42:22] Optimized purity: 0.19
INFO [2018-05-08 23:42:22] Testing local optimum 9/20 at purity 0.78 and total ploidy 5.00...
INFO [2018-05-08 23:42:23] Fitting variants for purity 0.77, tumor ploidy 5.80 and contamination 0.01.
INFO [2018-05-08 23:42:24] Optimized purity: 0.77
INFO [2018-05-08 23:42:24] Testing local optimum 10/20 at purity 0.75 and total ploidy 4.20...
INFO [2018-05-08 23:42:24] Fitting variants for purity 0.71, tumor ploidy 4.87 and contamination 0.01.
INFO [2018-05-08 23:42:25] Optimized purity: 0.71
INFO [2018-05-08 23:42:25] Testing local optimum 11/20 at purity 0.62 and total ploidy 3.20...
INFO [2018-05-08 23:42:25] Fitting variants for purity 0.55, tumor ploidy 3.87 and contamination 0.01.
INFO [2018-05-08 23:42:26] Optimized purity: 0.55
INFO [2018-05-08 23:42:26] Testing local optimum 12/20 at purity 0.35 and total ploidy 1.60...
INFO [2018-05-08 23:42:26] Recalibrating log-ratios...
INFO [2018-05-08 23:42:26] Testing local optimum 12/20 at purity 0.35 and total ploidy 1.60...
INFO [2018-05-08 23:42:26] Recalibrating log-ratios...
INFO [2018-05-08 23:42:26] Testing local optimum 12/20 at purity 0.35 and total ploidy 1.60...
INFO [2018-05-08 23:42:27] Recalibrating log-ratios...
INFO [2018-05-08 23:42:27] Testing local optimum 12/20 at purity 0.35 and total ploidy 1.60...
INFO [2018-05-08 23:42:27] Testing local optimum 13/20 at purity 0.62 and total ploidy 2.60...
INFO [2018-05-08 23:42:27] Fitting variants for purity 0.66, tumor ploidy 3.00 and contamination 0.01.
INFO [2018-05-08 23:42:28] Optimized purity: 0.66
INFO [2018-05-08 23:42:28] Testing local optimum 14/20 at purity 0.52 and total ploidy 2.00...
INFO [2018-05-08 23:42:28] Fitting variants for purity 0.32, tumor ploidy 1.88 and contamination 0.01.
INFO [2018-05-08 23:42:29] Optimized purity: 0.32
INFO [2018-05-08 23:42:29] Testing local optimum 15/20 at purity 0.82 and total ploidy 2.80...
INFO [2018-05-08 23:42:29] Fitting variants for purity 0.86, tumor ploidy 3.00 and contamination 0.01.
INFO [2018-05-08 23:42:30] Optimized purity: 0.86
INFO [2018-05-08 23:42:30] Testing local optimum 16/20 at purity 0.78 and total ploidy 2.00...
INFO [2018-05-08 23:42:31] Fitting variants for purity 0.59, tumor ploidy 2.00 and contamination 0.01.
INFO [2018-05-08 23:42:31] Optimized purity: 0.59
INFO [2018-05-08 23:42:31] Testing local optimum 17/20 at purity 0.58 and total ploidy 1.40...
INFO [2018-05-08 23:42:32] Fitting variants for purity 0.56, tumor ploidy 1.00 and contamination 0.01.
INFO [2018-05-08 23:42:33] Optimized purity: 0.56
INFO [2018-05-08 23:42:33] Testing local optimum 18/20 at purity 0.78 and total ploidy 1.20...
INFO [2018-05-08 23:42:33] Fitting variants for purity 0.75, tumor ploidy 1.00 and contamination 0.01.
INFO [2018-05-08 23:42:34] Optimized purity: 0.75
INFO [2018-05-08 23:42:34] Testing local optimum 19/20 at purity 0.95 and total ploidy 3.00...
INFO [2018-05-08 23:42:34] Fitting variants for purity 0.87, tumor ploidy 3.00 and contamination 0.01.
INFO [2018-05-08 23:42:35] Optimized purity: 0.87
INFO [2018-05-08 23:42:35] Testing local optimum 20/20 at purity 0.95 and total ploidy 1.00...
INFO [2018-05-08 23:42:36] Fitting variants for purity 0.91, tumor ploidy 1.00 and contamination 0.01.
INFO [2018-05-08 23:42:36] Optimized purity: 0.91
INFO [2018-05-08 23:42:36] Done.
INFO [2018-05-08 23:42:36] ------------------------------------------------------------
andrewrech commented 6 years ago

Thanks. A uuid is added to the S3 upload, it is the same sample 10c3a27f-484a-4367-b288-2ae95ad7...

lima1 commented 6 years ago

But the numbers don’t align either, like coverage, number of variants in the Vcf.

andrewrech commented 6 years ago

Got it - you are right, of course. Same file but different parameters. I have made some progress on identifying the inputs that are causing this error. I will add a solid MRE asap.

Thanks

andrewrech commented 6 years ago

I am not able to consistently reproduce this bug. Samples with intervals totalling less than ~3 Mb fail about 10% of the time, randomly. It seems this is likely related to my system or a core Bioc package.

lima1 commented 6 years ago

You can set a seed with set.seed(123) for example. This should make it reproducible.

It's likely a corner case that happens due to the extreme downsampling. Feel free to reopen when you can reproduce.

lima1 commented 6 years ago

I think this might happen when there are no variants on a chromosome to impute the mapping bias. Try using the calculateMappingBiasVcf function to generate the RDS file. That should fix it.