lima1 / PureCN

Copy number calling and variant classification using targeted short read sequencing
https://bioconductor.org/packages/devel/bioc/html/PureCN.html
Artistic License 2.0
127 stars 32 forks source link

Potential VCF issue - char '/' not in lookup table #307

Closed emmabailey94 closed 1 year ago

emmabailey94 commented 1 year ago

Hi,

I'm having a problem running PureCN, which seems to be a problem with the vcf file - I've attached this. This is a Varscan vcf, but I can't figure out what's wrong with it. The error references " char '/' " being an issue, but I can't identify the cause of this in the vcf. I'm using the latest conda version of PureCN.

INFO [2023-07-10 17:34:48] Arguments: -normal.coverage.file  -tumor.coverage.file IS001_13_cnvkit_results/IS001-13_R1_001.sorted.cnr -log.ratio  -seg.file IS001_13_cnvkit_results/IS001_13_cnvkit.seg -vcf.file Varscan/IS001_13_normal-tumour.snp.vcf -normalDB  -genome hg38 -sex ? -args.setPriorVcf 6 -args.setMappingBiasVcf NULL -args.filterIntervals 100,0.05 -args.segmentation 0.005,NULL, -sampleid IS001_13 -min.ploidy 1.4 -max.ploidy 6 -max.non.clonal 0.2 -max.homozygous.loss 0.05,1e+07 -log.ratio.calibration 0.1 -model.homozygous FALSE -error 0.001 -interval.file  -min.logr.sdev 0.15 -max.segments 300 -plot.cnv TRUE -vcf.field.prefix PureCN. -cosmic.vcf.file  -DB.info.flag DB -POPAF.info.field POP_AF -Cosmic.CNT.info.field Cosmic.CNT -model beta -post.optimize TRUE -BPPARAM  -log.file IS001_13_pureCN/IS001_13.log -args.filterVcf <data> -fun.segmentation <data> -test.num.copy <data> -test.purity <data> -speedup.heuristics <data>
INFO [2023-07-10 17:34:48] Loading coverage files...
INFO [2023-07-10 17:34:52] Found log2-ratio in tumor coverage data.
INFO [2023-07-10 17:34:52] Mean coverages: chrX: 178.55, chrY: 166.11, chr1-22: 314.94.
INFO [2023-07-10 17:34:52] Mean coverages: chrX: 178.55, chrY: 166.11, chr1-22: 314.94.
INFO [2023-07-10 17:34:53] Using 220214 intervals (189647 on-target, 30567 off-target).
INFO [2023-07-10 17:34:53] Ratio of mean on-target vs. off-target read counts: NaN
INFO [2023-07-10 17:34:53] Mean off-target bin size: 69216
INFO [2023-07-10 17:34:53] Loading VCF...
Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'flesh' in selecting a method for function 'relist': key 47 (char '/') not in lookup table
Calls: runAbsoluteCN ... make_XStringSet_from_strings -> .Call2 -> .handleSimpleError -> h

Any advice would be really appreciated!

Thanks, Emma

IS001_13_normal-tumour.snp.txt

lima1 commented 1 year ago

Hi Emma,

That's an error message in theVariantAnnotation::readVcf function. You can validate a VCF for example with gatk ValidateVariants -V ~/IS001_13_normal-tumour.snp.vcf. The issue here are lines with multiple alt alleles separated by slash. Since there is no fix I can do, I would recommending removing those tri-allelic sites upstream of PureCN and maybe reach out to the developer of the variant caller.

Best, Markus

emmabailey94 commented 1 year ago

Ah yes thank you! After manually removing these rows from the vcf, PureCN runs perfectly. Thanks for the help :)