jpuntomarcos / CNVfilteR

R package to remove false positives of CNV calling tools by using SNV calls
5 stars 1 forks source link

Issue with loadVCFs(): no contents being loaded #12

Closed PengZhangJHU closed 1 year ago

PengZhangJHU commented 1 year ago

Hi

I was able to run loadVCFs() but it seems that no actual contents from VCF being loaded, as it shows '0 ranges and 8 metadata columns', whereas it worked well with CNVfilteR provided VCF. I don't see any obvious differences from the two VCFs. Any suggestions how to make it work? thank you!


experiment sample

vcfs.test <- loadVCFs(vcf.files.test, cnvs.gr = cnvs.test.gr, genome = "hg19", vcf.source = "UnifiedGenotyper", min.total.depth=3 ) Scanning file /tmp/RtmpfXTMYu/230400-1163279691.vcf.gz... UnifiedGenotyper was found as source in the VCF metadata, AD will be used as allele support field in a list format: ref allele, alt allele. head(vcfs.test) $230400-1163279691 GRanges object with 0 ranges and 8 metadata columns: seqnames ranges strand | ref alt ref.support alt.support

| alt.freq total.depth indel type ------- seqinfo: 24 sequences from hg19 genome

Example vcfs provided by CNVfilteR with 13 ranges

head(vcfs) $sample1 GRanges object with 13 ranges and 8 metadata columns: seqnames ranges strand | ref alt ref.support

| 2:48026019_G/C chr2 48026019 * | G C 516 2:48027019_G/C chr2 48027019 * | G C 1528 2:48027182_G/A chr2 48027182 * | G A 1506 2:48027434_A/T chr2 48027434 * | A T 1593 2:48027763_G/A chr2 48027763 * | G A 900 ... ... ... ... . ... ... ... 13:32968591_G/A chr13 32968591 * | G A 9 13:32968607_A/G chr13 32968607 * | A G 14 17:41244435_T/C chr17 41244435 * | T C 1476 17:41244714_C/G chr17 41244714 * | C G 1651 17:41251931_G/A chr17 41251931 * | G A 330 alt.support alt.freq total.depth indel type 2:48026019_G/C 521 50.2411 1037 FALSE ht 2:48027019_G/C 964 38.6838 2492 FALSE ht 2:48027182_G/A 971 39.2006 2477 FALSE ht 2:48027434_A/T 1462 47.8560 3055 FALSE ht 2:48027763_G/A 863 48.9507 1763 FALSE ht ... ... ... ... ... ... 13:32968591_G/A 13 59.0909 22 FALSE ht 13:32968607_A/G 25 64.1026 39 FALSE ht 17:41244435_T/C 1381 48.3374 2857 FALSE ht 17:41244714_C/G 1658 50.1058 3309 FALSE ht 17:41251931_G/A 296 47.2843 626 FALSE ht ------- seqinfo: 21 sequences from hg19 genome; no seqlengths
jpuntomarcos commented 1 year ago

Hi @PengZhangJHU

Is the cnvs.gr parameter overlapping the variants from the VCF? Remember that "only those variants in regions affected by CNVs will be loaded".

Also, in order to reproduce the issue, Could you please provide a fake VCF and a CNV file? Thanks :)

PengZhangJHU commented 1 year ago

Thanks for your comments. I just realized the sample names are slightly different between the VCF and CNV file. It works now after I made the sample name consistent.