lima1 / PureCN

Copy number calling and variant classification using targeted short read sequencing
https://bioconductor.org/packages/devel/bioc/html/PureCN.html
Artistic License 2.0
127 stars 32 forks source link

Error in !info(vcf)[[DB.info.flag]] : invalid argument type #108

Closed stets43 closed 4 years ago

stets43 commented 4 years ago

Hi, I am running PureCN on cell line whole exome sequencing data with a pool of process matched normals. I am running into some issues with the .vcf files.

SKLMS1.VCF <- readVcf('SKLMSI_S3_L007_P0001.MuTect.vcf') SKLMS1.CN <- runAbsoluteCN(normal.coverage.file = pool, tumor.coverage.file = 'SKLMS1.Coverage', vcf.file = SKLMS1.VCF, sex = "?", genome = "hg38", sampleid = "SKLMS1", interval.file = 'LMSIntervalsGC.txt', normalDB = normalDB, model.homozygous = TRUE,

log.ratio = NULL,

                       #seg.file = NULL,
                       plot.cnv = FALSE,
                       fun.filterVcf = filterVcfMuTect(vcf = SKLMS1.VCF, use.somatic.status = FALSE, DB.info.flag = "SOMATIC"),
                       args.segmentation=list(interval.weight.file=interval.weight.file),
                       args.filterVcf = list(vcf = SKLMS1.VCF, use.somatic.status = FALSE, DB.info.flag = "SOMATIC"),
                       DB.info.flag = "SOMATIC",
                       post.optimize = FALSE,
                       test.purity = seq(0.9,0.99,by=0.01), 
                       verbose=TRUE) 

INFO [2019-10-14 17:05:04] ------------------------------------------------------------ INFO [2019-10-14 17:05:04] PureCN 1.14.3 INFO [2019-10-14 17:05:04] ------------------------------------------------------------ INFO [2019-10-14 17:05:05] Loading coverage files... INFO [2019-10-14 17:05:05] Mean target coverages: 67X (tumor) 69X (normal). INFO [2019-10-14 17:05:05] Mean coverages: chrX: 47.49, chrY: 24.12, chr1-22: 64.12. INFO [2019-10-14 17:05:05] Mean coverages: chrX: 55.36, chrY: 35.18, chr1-22: 68.66. INFO [2019-10-14 17:05:06] Removing 24 intervals with missing log.ratio. INFO [2019-10-14 17:05:06] Removing 16 low/high GC targets. INFO [2019-10-14 17:05:06] Removing 129 intervals excluded in normalDB. INFO [2019-10-14 17:05:06] normalDB provided. Setting minimum coverage for segmentation to 0.0015X. INFO [2019-10-14 17:05:06] Removing 567 low coverage (< 0.0015X) intervals. INFO [2019-10-14 17:05:06] Using 7779 intervals (7779 on-target, 0 off-target). INFO [2019-10-14 17:05:06] No off-target intervals. If this is hybrid-capture data, consider adding them. INFO [2019-10-14 17:05:06] AT/GC dropout: 1.10 (tumor), 1.08 (normal). INFO [2019-10-14 17:05:06] Loading VCF... INFO [2019-10-14 17:05:06] Found 1265 variants in VCF file. INFO [2019-10-14 17:05:06] 1147 (90.7%) variants annotated as likely germline (SOMATIC INFO flag). INFO [2019-10-14 17:05:06] SKLMSI_S3 is tumor in VCF file. INFO [2019-10-14 17:05:06] 10 homozygous and 2 heterozygous variants on chrX. INFO [2019-10-14 17:05:06] Sex from VCF: NA (Fisher's p-value: 0.001, odds-ratio: 9.04). WARN [2019-10-14 17:05:06] Duplicated arguments in filterVcf Error in !info(vcf)[[DB.info.flag]] : invalid argument type

It looks as though the DB.info.flag I assigned [["SOMATIC"]] is not being passed successfully to !info(vcf). Is there a way to fix this?

Thanks!

lima1 commented 4 years ago

Hi,

does the VCF contain a matched normal? If yes, then the SOMATIC flag indicates somatic mutations (TRUE when somatic). If this is a tumor-only VCF, the DB flag is all you need. This should be TRUE when in germline databases.

If this is whole exome, then you have a very small number of baits (7779). Are you sure you are using the correct interval file for this assay? I would recommend following the Quick vignette, at least to generate the interval.file. It performs a couple of additional sanity checks.

Markus

stets43 commented 4 years ago

This should be a tumor-only vcf, however the DB column is entirely comprised of FALSE values, hence why I switched the DB flag.

The small number of targets on the .bed file also caused me some concern. I inherited these files from a colleague and wonder whether they used a specific gene panel rather than the whole exome. I will go back on make sure that the .bed files are accurate and that the .vcf files were generated properly.

Thanks for the prompt response!

lima1 commented 4 years ago

I think the DB.info.flag in fun.filterVcf and args.filterVcf are also unnecessary and probably causes the "Duplicated arguments in filterVcf" warning. You can also remove the DB flag and instead use bcftools to annotate variants with dbSNP id (see FAQ).

But again, I would follow the tutorial at https://bioconductor.org/packages/devel/bioc/vignettes/PureCN/inst/doc/Quick.html.

(this will require the latest GitHub version, which will become the new stable in 2 weeks; install it with BiocManager::install("lima1/PureCN")).

lima1 commented 4 years ago

Closing now. Feel free to reopen if necessary.