jpuntomarcos / CNVfilteR

R package to remove false positives of CNV calling tools by using SNV calls
5 stars 1 forks source link

Error in loadSNPsFromVCF(vcf.file = vcfFile, verbose = verbose, vcf.source = vcf.source, : dims [product 208] do not match the length of object [210] #10

Closed AndreaG5 closed 1 year ago

AndreaG5 commented 1 year ago

Hi, I'm creating this issue since it's a little bit weird what's happening. When I use the function "loadVCFs" it outputs (but just for a couple of samples) the error in the title. I understand there's a problem of dimension, but it seems weird to me to understand why the majority of samples work correctly while few others no. Further, I open such vcf and the dimension is of 212, so I don't get where [210] or [208] are extracted. Here my code:

cnvs_gr <- loadCNVcalls(cnvs.file = f, chr.column = "chromosome",
                                                start.column = "start", end.column = "end", 
                                                cnv.column = "type", #sample.column = "Sample",
                                                genome = "hg19", gene.column = "gene", ignore.unexpected.rows = T, sep = ",", sample.name = sample_name)

vcf_path = paste(input_folder, "/", sample_name, "/", sep = "")
vcf_files = list.files(vcf_path, pattern = "*.vcf", full.names = T, recursive = T)
index = grep("Non-Filtered", vcf_files)
vcf = vcf_files[index]

vcfs <- loadVCFs(vcf, cnvs.gr = cnvs_gr, ref.support.field = "RO", alt.support.field = "AO", min.total.depth = 20, vcf.source = "Ion")

I don't know if anyone can help me. I'm available to give .bam and .vcf to reproduce it.

Thanks!!!

jpuntomarcos commented 1 year ago

Hi @AndreaG5

In order to find out what is going on, Would you be able to provide an example of cnvs.file (f in your code) and vcfs to reproduce the error? Thanks.

AndreaG5 commented 1 year ago

Hi @jpuntomarcos

The issue comes from "LoadVCFs" function, so basically the cnvs.file (f) works. Anyway, I attach a the link to the repository in which you can find file 001 (the one that outputs error) and file 002 (the one that doesn't return any error)

https://github.com/AndreaG5/Issues.git

Thanks

jpuntomarcos commented 1 year ago

Hi Andrea,

I found out the problem. The VCF file you used contains multiallelic sites, for example: chr3 193372598 .;. TTA T,TTT

Currlently CNVfilteR does not support mutiallelic sites. As an easy work around, you can split mutiallelic sites by doing bcftools norm -N -m -both yourSample.vcf > splited.vcf

I will update the documentation to clarify. Also, I plan to update CNVfilteR in the future in order to support multiallelic sites.

AndreaG5 commented 1 year ago

Hi @jpuntomarcos ,

I thought about it... Thank you so much for quick reply! Best!