knausb / vcfR

Tools to work with variant call format files
248 stars 54 forks source link

Read depth can not be plotted from VCF file #173

Closed AG-Run closed 4 years ago

AG-Run commented 4 years ago

Dear Knausb

Cordial greeting. Im trying to plot my vcf file with the next script

vcf <- read.vcfR("file.vcf", verbose = FALSE) chrom <- create.chromR(name="tig000129l_plot", vcf=vcf, verbose=TRUE) chrom <- masker(chrom, min_QUAL=0, min_DP=1, max_DP=650, min_MQ=20, max_MQ=60.5) chrom <- proc.chromR(chrom, verbose = TRUE) chromoqc(chrom, dp.alpha = 22)

My VCF file looks like this

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Sample

tig000129l 506 . GC GCC 5 . TYPE=INDEL GT:PL:GQ:DP:ADP:ACN 0/1:42,12,81:5:3:2,1:1,1 tig000129l 521 . TG TGG 5 . TYPE=INDEL GT:PL:GQ:DP:ADP:ACN 0/1:42,12,81:5:3:2,1:1,1

I dont know why the DP (read depth) is not getting plotted.

Also, I dont know how the program can infer a Mapping qualities from the VCF file, as I know, the variants does not have a MQ associated, only aligments (this plot is also empty in my graph)

Thanks

knausb commented 4 years ago

Hi @AG-Run , the chromR plot gets it's depth (DP) data from the INFO column. This is a summary over all samples in your VCF file. Your variant caller did not report this information so there's nothing to plot. Early in the development of vcfR I thought this would be useful information. Later I realized that some variant callers report depth on a per sample basis, as your data appears. I prefer this perspective because it can help identify individual samples that may have high or low coverage. I have examples below.

https://knausb.github.io/vcfR_documentation/sequence_coverage.html https://knausb.github.io/vcfR_documentation/depth_plot.html

At the time I created that page I used cowplot to get multi panel graphics. These days I use ggpubr instead. If you really want depth in the chromR object you can use row sums() on the matrix of individual samples and use that to populate the INFO column.

The package vcfR does not infer mapping qualities. It simply tries to help you manage it if your variant caller reports that information in the VCF file. Because your VCF data lacks mapping quality there is nothing to report. Hopefully, the blank plot helps you discover that this information is not present in your data.

Good luck! Brian