Closed SarahEmel closed 5 years ago
If your subset is a good representation of the original dataset, the problem is that your VCF as individual's that are missing all of their genotypes. My code was not prepared to handle this and the code throws an error when it tries to do a standard SD with base R code (it doesn't accept NA, NaN , etc).
The new commit should fix that.
I suggest looking at the file that start with individuals.qc.stats
and highlight the bad samples.
Thank you Thierry! It took me a while to get everything working with my dataset, but I am able to run filter_rad now without errors. We had a lot of variation in reads per individual and there was one bad sample that had to be removed in addition to other filtering.
ok glad I could help! lots of variation in reads per individuals can be lowered by...
Prior to sequencing (the most important steps):
bioinformatically:
Hi Thierry,
I am attempting to use radiator::filter_rad to filter a populations.snps.vcf file from Stacks, but I am consistently getting an error in ''generating individual stats" and then I do not get the output files I've specified. Based on testing with different arguments and versions of the input file (including after filtering out some missing data in vcftools) and on reading other issues here, I suspect there might be a problem with my vcf header that I'm unaware of.
I appreciate any help you can offer. Thanks for making this package and I hope to be able to make it part of my workflow!
-Sarah
All of these lead to the same error: myfiltereddata <- filter_rad( data = "populations.snps.vcf", filter.short.ld = "random", filter.long.ld = 0.5, filter.hwe = TRUE, strata = "strata_locyear.txt", output = c("vcf","genind"), filename = "filter_rad_output")
myfiltereddata <- filter_rad( data = "populations.snps.vcf", filter.short.ld = "random", filter.long.ld = 0.5, filter.hwe = TRUE, strata = NULL, output = c("vcf","genind"), filename = "filter_rad_output")
myfiltereddata <- filter_rad( data = "populations.snps.vcf", strata = NULL, interactive.filter = TRUE)
myfiltereddata <- filter_rad( data = "populations.snps.vcftools.recode", strata = NULL, interactive.filter = TRUE)
myfiltereddata <- filter_rad( data = "RH_subset.vcf", strata = NULL, interactive.filter = TRUE)
the complete error message you're getting
Execution date@time: 20190820@0008 Folder created: filter_rad_20190820@0008 Function call and arguments stored in: radiator_filter_rad_args_20190820@0008.tsv File written: random.seed (563656) Filters parameters file generated: filters_parameters_20190820@0008.tsv
Reading VCF Data summary: number of samples: 94 number of markers: 185 done! timing: 0 sec
Generating individual stats... Error in if (stats::sd(id.info$COVERAGE_MEAN) != 0) { : missing value where TRUE/FALSE needed
Computation time, overall: 1 sec ############################# completed filter_rad ############################# RH_subset.vcf.zip
devtools::session_info()
############################# completed filter_rad #############################
─ Packages ────────────────────────────────────────────────────────────────────────────────────── package version date lib source
assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.6.0)
backports 1.1.4 2019-04-10 [1] CRAN (R 3.6.0)
Biobase 2.44.0 2019-05-02 [1] Bioconductor
BiocGenerics 0.30.0 2019-05-02 [1] Bioconductor
Biostrings 2.52.0 2019-05-02 [1] Bioconductor
bitops 1.0-6 2013-08-17 [1] CRAN (R 3.6.0)
boot 1.3-23 2019-07-05 [1] CRAN (R 3.6.0)
broom 0.5.2 2019-04-07 [1] CRAN (R 3.6.0)
callr 3.3.1 2019-07-18 [1] CRAN (R 3.6.0)
cli 1.1.0 2019-03-19 [1] CRAN (R 3.6.0)
crayon 1.3.4 2017-09-16 [1] CRAN (R 3.6.0)
data.table 1.12.2 2019-04-07 [1] CRAN (R 3.6.0)
desc 1.2.0 2018-05-01 [1] CRAN (R 3.6.0)
devtools 2.1.0 2019-07-06 [1] CRAN (R 3.6.0)
digest 0.6.20 2019-07-04 [1] CRAN (R 3.6.0)
dplyr 0.8.3 2019-07-04 [1] CRAN (R 3.6.0)
fs 1.3.1 2019-05-06 [1] CRAN (R 3.6.0)
gdsfmt 1.20.0 2019-05-02 [1] Bioconductor
generics 0.0.2 2018-11-29 [1] CRAN (R 3.6.0)
GenomeInfoDb 1.20.0 2019-05-02 [1] Bioconductor
GenomeInfoDbData 1.2.1 2019-08-02 [1] Bioconductor
GenomicRanges 1.36.0 2019-05-02 [1] Bioconductor
glue 1.3.1 2019-03-12 [1] CRAN (R 3.6.0)
GWASExactHW 1.01 2013-01-05 [1] CRAN (R 3.6.0)
hms 0.5.0 2019-07-09 [1] CRAN (R 3.6.0)
IRanges 2.18.1 2019-05-31 [1] Bioconductor
jomo 2.6-9 2019-07-29 [1] CRAN (R 3.6.0)
lattice 0.20-38 2018-11-04 [1] CRAN (R 3.6.0)
lme4 1.1-21 2019-03-05 [1] CRAN (R 3.6.0)
logistf 1.23 2018-07-19 [1] CRAN (R 3.6.0)
magrittr 1.5 2014-11-22 [1] CRAN (R 3.6.0)
MASS 7.3-51.4 2019-03-31 [1] CRAN (R 3.6.0)
Matrix 1.2-17 2019-03-22 [1] CRAN (R 3.6.0)
memoise 1.1.0 2017-04-21 [1] CRAN (R 3.6.0)
mgcv 1.8-28 2019-03-21 [1] CRAN (R 3.6.0)
mice 3.6.0 2019-07-10 [1] CRAN (R 3.6.0)
minqa 1.2.4 2014-10-09 [1] CRAN (R 3.6.0)
mitml 0.3-7 2019-01-07 [1] CRAN (R 3.6.0)
nlme 3.1-140 2019-05-12 [1] CRAN (R 3.6.0)
nloptr 1.2.1 2018-10-03 [1] CRAN (R 3.6.0)
nnet 7.3-12 2016-02-02 [1] CRAN (R 3.6.0)
pan 1.6 2018-06-29 [1] CRAN (R 3.6.0)
pillar 1.4.2 2019-06-29 [1] CRAN (R 3.6.0)
pkgbuild 1.0.3 2019-03-20 [1] CRAN (R 3.6.0)
pkgconfig 2.0.2 2018-08-16 [1] CRAN (R 3.6.0)
pkgload 1.0.2 2018-10-29 [1] CRAN (R 3.6.0)
prettyunits 1.0.2 2015-07-13 [1] CRAN (R 3.6.0)
processx 3.4.1 2019-07-18 [1] CRAN (R 3.6.0)
ps 1.3.0 2018-12-21 [1] CRAN (R 3.6.0)
purrr 0.3.2 2019-03-15 [1] CRAN (R 3.6.0)
R6 2.4.0 2019-02-14 [1] CRAN (R 3.6.0)
radiator 1.1.2 2019-08-20 [1] Github (thierrygosselin/radiator@53a137d) Rcpp 1.0.2 2019-07-25 [1] CRAN (R 3.6.0)
RCurl 1.95-4.12 2019-03-04 [1] CRAN (R 3.6.0)
readr 1.3.1 2018-12-21 [1] CRAN (R 3.6.0)
remotes 2.1.0 2019-06-24 [1] CRAN (R 3.6.0)
rlang 0.4.0 2019-06-25 [1] CRAN (R 3.6.0)
rpart 4.1-15 2019-04-12 [1] CRAN (R 3.6.0)
rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.6.0)
rstudioapi 0.10 2019-03-19 [1] CRAN (R 3.6.0)
S4Vectors 0.22.0 2019-05-02 [1] Bioconductor
SeqArray 1.24.2 2019-07-12 [1] Bioconductor
SeqVarTools 1.22.0 2019-05-02 [1] Bioconductor
sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.6.0)
stringi 1.4.3 2019-03-12 [1] CRAN (R 3.6.0)
survival 2.44-1.1 2019-04-01 [1] CRAN (R 3.6.0)
testthat 2.2.1 2019-07-25 [1] CRAN (R 3.6.0)
tibble 2.1.3 2019-06-06 [1] CRAN (R 3.6.0)
tidyr 0.8.3 2019-03-01 [1] CRAN (R 3.6.0)
tidyselect 0.2.5 2018-10-11 [1] CRAN (R 3.6.0)
usethis 1.5.1 2019-07-04 [1] CRAN (R 3.6.0)
vctrs 0.2.0 2019-07-05 [1] CRAN (R 3.6.0)
withr 2.1.2 2018-03-15 [1] CRAN (R 3.6.0)
XVector 0.24.0 2019-05-02 [1] Bioconductor
yaml 2.2.0 2018-07-25 [1] CRAN (R 3.6.0)
zeallot 0.1.0 2018-01-28 [1] CRAN (R 3.6.0)
zlibbioc 1.30.0 2019-05-02 [1] Bioconductor
[1] /Library/Frameworks/R.framework/Versions/3.6/Resources/library
See attached subset