jpuntomarcos / CNVfilteR

R package to remove false positives of CNV calling tools by using SNV calls
5 stars 1 forks source link

plotAllCnvs don't work with hg38 #4

Closed georgiiprovisor closed 3 years ago

georgiiprovisor commented 3 years ago

plotAllCNVs(cnvs060126NORMAL)

Ошибка в mergeNamedAtomicVectors(genome(x), genome(y), what = c("sequence", : sequences chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr20, chr21, chr22, chrX, chrY have incompatible genomes:

georgiiprovisor commented 3 years ago

Oh, it totally not work with hg38, i think. SNPs not matching with CNV when I choose hg38, but with hg19 all looks fine Is it possible to run CNVFilteR on samples with hg38 reference?

jpuntomarcos commented 3 years ago

Hi @georgiiprovisor

Thanks for reporting the issue. You are right, the function plotAllCNVs() only works for the "hg19" genome. Let me do some checks and I will upload an update to make it work on any genome version.

georgiiprovisor commented 3 years ago

Thank you! You will update only plotAllCNVs function or all functions?

jpuntomarcos commented 3 years ago

After finishing the checks, I will update any function that did not work with the hg38 genome. I expect to do so during next week.

jpuntomarcos commented 3 years ago

Hi @georgiiprovisor

I have aligned the AK1-genome sample against the HG38 reference. After multiple checks, the only function that didn't work was plotAllCNVs(). I will update it soon so CNVs will be plotted as expected: image

So, all the other functions worked for the HG38 genome. Please, remember to set genome = "hg38" in the loadCNVcalls() and loadVCFs() functions because "hg19" is the default value.

georgiiprovisor commented 3 years ago

Hello! I run this command cnvs060126NORMAL <- loadCNVcalls(cnvs.file = '/home/georgii/Nsc/CNV/CNvfilter_tests/060126_normal.tsv', chr.column = "chromosome", start.column = "start", end.column = "end", cnv.column = "CNV", deletion = "Del", duplication = "Dup", genome = "hg38", sample.name = '06_0126_normal', ignore.unexpected.rows = TRUE)

vcfs_060126_normal <- loadVCFs('/home/georgii/Nsc/CNV/CNvfilter_tests/06-0126_normal_2nd_ed_recal.vcf', cnvs.gr = cnvs060126NORMAL, vcf.source = "HaplotypeCaller", genome = "hg38")

results06_0126 <- filterCNVs(cnvs060126NORMAL, vcfs_060126_normal)

But in results06_0126[["cnvs"]]@elementMetadata no SNPs connected with CNV DataFrame with 230 rows and 10 columns cnv sample cnv.id filter n.total.variants n.hm.variants n.ht.discard.CNV n.ht.confirm.CNV ht.pct score

1 deletion 06_0126_normal 1 2 deletion 06_0126_normal 2 3 deletion 06_0126_normal 3 4 deletion 06_0126_normal 4 5 deletion 06_0126_normal 5 ... ... ... ... ... ... ... ... ... ... ... 226 deletion 06_0126_normal 226 227 deletion 06_0126_normal 227 228 deletion 06_0126_normal 228 229 duplication 06_0126_normal 229 230 deletion 06_0126_normal 230 But in VCF-file i get SNPs that should overlap with CNV
jpuntomarcos commented 3 years ago

Please, could you provide an example of a CNVs file and a VCF file for which the function does not work as expected? In my HG38 example SNPs overlap correctly with the CNVs, but I wonder if I am missing something that I cannot see with my example.

(By the way, remember that the loadVCFs() function will discard SNVs with a min total depth < 10, by default).

Thanks :)

georgiiprovisor commented 3 years ago

060126_normal.tsv.gz 06-0126_normal_2nd_ed_recal__.vcf.gz

I attach full files (vcf and tsv with CNV). Few SNPs with DP > 10 should overlap 1-st Del from tsv file (for example chr1:14653 ; chr1:873251 ; chr1:931131)

Thank you for answering :)

jpuntomarcos commented 3 years ago

Hi,

I discovered what is happening. In your loadCNVcalls() call you indicated that the sample.name is 06_0126_normal. However, when loading the VCF (loadVCFs()) CNVfilteR looks into the the VCF and extract the sample name: TCGA-06-0126-10A-01D-1490-08 is the one that appears in that file. So, if you indicate the correct sample name (TCGA-06-0126-10A-01D-1490-08) when loading the CNVs, it should work. I will modify the code to show a Warning message when the expected sample name was not found. This is the code I used:

cnvs.gr <- loadCNVcalls(cnvs.file = cnvsPath, chr.column = "chromosome", start.column = "start", end.column = "end", cnv.column = "CNV", deletion = "Del", duplication = "Dup", genome = "hg38", sample.name = 'TCGA-06-0126-10A-01D-1490-08', ignore.unexpected.rows = TRUE) vcf <- loadVCFs(vcfPath, cnvs.gr = cnvs.gr, vcf.source = "HaplotypeCaller", genome = "hg38") res <- filterCNVs(cnvs.gr, vcf, verbose = T)

And if you plot first CNV by doing plotVariantsForCNV(res, "1"), you should obtain:

image

Remember that you can explore the results in res$cnvs and res$variantsForEachCNV.

Hope this helps.

José Marcos.

georgiiprovisor commented 3 years ago

Oh, it was so stupid fail from my side. Sorry :) Thanks a lot!

jpuntomarcos commented 3 years ago

Hi, CNVfilteR pacakge has been updated to version 1.5.1. Among other fixes, plotAllCNVs() allows the genome parameter.