Closed jsstanley closed 5 years ago
Hi Jay,
Thanks for taking the time to create an example! In order to make it a reproducible example we need a data set we can share. And I think I can simplify things a bit.
library("vcfR")
#>
#> ***** *** vcfR *** *****
#> This is vcfR 1.8.0.9000
#> browseVignettes('vcfR') # Documentation
#> citation('vcfR') # Citation
#> ***** ***** ***** *****
# Load example data.
data(vcfR_test)
vcfR_test
#> ***** Object of Class vcfR *****
#> 3 samples
#> 1 CHROMs
#> 5 variants
#> Object size: 0 Mb
#> 0 percent missing data
#> ***** ***** *****
# Convert the FILTER column to PASS/FAIL.
vcfR_test@fix[2:3, "FILTER"] <- "FAIL"
# Validate.
getFILTER(vcfR_test)
#> [1] "PASS" "FAIL" "FAIL" "PASS" "PASS"
PASSfilter <- vcfR_test@fix[, "FILTER"] %in% "PASS"
write.vcf(vcfR_test, mask = PASSfilter, file = "filtered.vcf.gz")
#> Warning in if (mask == FALSE) {: the condition has length > 1 and only the
#> first element will be used
#> Warning in if (mask == TRUE) {: the condition has length > 1 and only the
#> first element will be used
# A fix
vcfR_test <- vcfR_test[PASSfilter, ]
vcfR_test
#> ***** Object of Class vcfR *****
#> 3 samples
#> 1 CHROMs
#> 3 variants
#> Object size: 0 Mb
#> 0 percent missing data
#> ***** ***** *****
write.vcf(vcfR_test, file = "filtered.vcf.gz")
Created on 2019-05-14 by the reprex package (v0.2.1)
Our documentation does say a mask can be a "logical vector" but in the code we test if it is set to TRUE or FALSE as if it is a vector of length one. I can see now that this may be confusing.
Use of a mask sounded like a good idea to me early on in the development of this project. In practice, I never use it. And I suspect if others had used it this would have been reported by now. My vote is to deprecate the argument "mask" so we can get rid of it. Any other opinions out there?
Hi Brian,
Thanks for responding so quickly! The alternative code you provided seems to work well for me.
I don't have much experience working with the package but if the 'mask' argument is redundant and there are better ways of filtering, then I think it's probably worth deprecating.
Thanks again for your help!
Problems with subsetting using mask argument in write.vcf()
When using the write.vcf() function with a 'mask' argument to save a filtered version of a VCF file, I am getting the following error message:
I am trying to create a filtered VCF file with only variants that have PASS in the 'FILTER' column of the 'fix' part of the VCF file. The documentation for the write.vcf() function states I can use the 'mask' argument with a "logical vector indicating rows to use". Unfortunately, it seems that the function is only expecting me to provide a single Boolean value. I suspect this issue may result from the use of
if (mask == TRUE)
rather thanifelse (mask == TRUE)
, although I may be mistaken.I could use the tidy format to filter out rows using vcfR2tidy() but I need a VCF file as an output for downstream analyses and unfortunately I can't locate an equivalent function to convert a tidy-style object back to a VCF object.
Here's a minimal reproducible example to illustrate the issue:
Load the vcfR package
Create example data
Filter the VCF file
Error message that is returned in the console following execution of above code
sessionInfo() output