Closed splaisan closed 4 years ago
Hi @splaisan ,
I'm not sure I understand your goal here. The GATK is a dynamic project, so their recommendations change through time and I'm not sure where they currently are. We can build an example though.
Below is an example of working with the INFO data
library(vcfR)
#>
#> ***** *** vcfR *** *****
#> This is vcfR 1.8.0
#> browseVignettes('vcfR') # Documentation
#> citation('vcfR') # Citation
#> ***** ***** ***** *****
data("vcfR_test")
vcfR_test
#> ***** Object of Class vcfR *****
#> 3 samples
#> 1 CHROMs
#> 5 variants
#> Object size: 0 Mb
#> 0 percent missing data
#> ***** ***** *****
getINFO(vcfR_test)
#> [1] "NS=3;DP=14;AF=0.5;DB;H2" "NS=3;DP=11;AF=0.017"
#> [3] "NS=2;DP=10;AF=0.333,0.667;AA=T;DB" "NS=3;DP=13;AA=T"
#> [5] "NS=3;DP=9;AA=G"
vcf2 <- vcfR_test[as.numeric(extract.info(vcfR_test, element = "DP")) > 10, ]
vcf2
#> ***** Object of Class vcfR *****
#> 3 samples
#> 1 CHROMs
#> 3 variants
#> Object size: 0 Mb
#> 0 percent missing data
#> ***** ***** *****
vcf2@fix
#> CHROM POS ID REF ALT QUAL FILTER
#> [1,] "20" "14370" "rs6054257" "G" "A" "29" "PASS"
#> [2,] "20" "17330" NA "T" "A" "3" "q10"
#> [3,] "20" "1230237" NA "T" NA "47" "PASS"
#> INFO
#> [1,] "NS=3;DP=14;AF=0.5;DB;H2"
#> [2,] "NS=3;DP=11;AF=0.017"
#> [3,] "NS=3;DP=13;AA=T"
Created on 2019-12-09 by the reprex package (v0.3.0)
Personally, I don't work with the INFO column much. This is because these are summaries over all the samples. If you have a few low quality samples they will drive down these summaries. I prefer to work with the data associated with each individual genotype.
Please let me know if that gets you what you need. Or if we need to expand the example.
your command syntax worked out just fine, thanks a lot for your help
Thank you for reporting your outcome!
I would like to imitate the GATK genotyper filtering using vcfR but masker seems to be tailored to FORMAT fields. Is there a way to filter on INFO fields and if yes could you please share a demo command Thanks
here is the GATK hard filtering logic and the corresponding INFO fields are present
QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0 || SOR > 3.0 || QUAL < 30