thierrygosselin / radiator

RADseq Data Exploration, Manipulation and Visualization using R
https://thierrygosselin.github.io/radiator/
GNU General Public License v3.0
59 stars 23 forks source link

radiator::filter_rad generating individual stats error #58

Closed SarahEmel closed 5 years ago

SarahEmel commented 5 years ago

Hi Thierry,

I am attempting to use radiator::filter_rad to filter a populations.snps.vcf file from Stacks, but I am consistently getting an error in ''generating individual stats" and then I do not get the output files I've specified. Based on testing with different arguments and versions of the input file (including after filtering out some missing data in vcftools) and on reading other issues here, I suspect there might be a problem with my vcf header that I'm unaware of.

I appreciate any help you can offer. Thanks for making this package and I hope to be able to make it part of my workflow!

-Sarah

All of these lead to the same error: myfiltereddata <- filter_rad( data = "populations.snps.vcf", filter.short.ld = "random", filter.long.ld = 0.5, filter.hwe = TRUE, strata = "strata_locyear.txt", output = c("vcf","genind"), filename = "filter_rad_output")

myfiltereddata <- filter_rad( data = "populations.snps.vcf", filter.short.ld = "random", filter.long.ld = 0.5, filter.hwe = TRUE, strata = NULL, output = c("vcf","genind"), filename = "filter_rad_output")

myfiltereddata <- filter_rad( data = "populations.snps.vcf", strata = NULL, interactive.filter = TRUE)

myfiltereddata <- filter_rad( data = "populations.snps.vcftools.recode", strata = NULL, interactive.filter = TRUE)

myfiltereddata <- filter_rad( data = "RH_subset.vcf", strata = NULL, interactive.filter = TRUE)

Execution date@time: 20190820@0008 Folder created: filter_rad_20190820@0008 Function call and arguments stored in: radiator_filter_rad_args_20190820@0008.tsv File written: random.seed (563656) Filters parameters file generated: filters_parameters_20190820@0008.tsv

Reading VCF Data summary: number of samples: 94 number of markers: 185 done! timing: 0 sec

Generating individual stats... Error in if (stats::sd(id.info$COVERAGE_MEAN) != 0) { : missing value where TRUE/FALSE needed

Computation time, overall: 1 sec ############################# completed filter_rad ############################# RH_subset.vcf.zip

############################# completed filter_rad #############################

devtools::session_info() ─ Session info ────────────────────────────────────────────────────────────────────────────────── setting value
version R version 3.6.0 (2019-04-26) os macOS High Sierra 10.13.6
system x86_64, darwin15.6.0
ui RStudio
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz America/New_York
date 2019-08-20

─ Packages ────────────────────────────────────────────────────────────────────────────────────── package version date lib source
assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.6.0)
backports 1.1.4 2019-04-10 [1] CRAN (R 3.6.0)
Biobase 2.44.0 2019-05-02 [1] Bioconductor
BiocGenerics 0.30.0 2019-05-02 [1] Bioconductor
Biostrings 2.52.0 2019-05-02 [1] Bioconductor
bitops 1.0-6 2013-08-17 [1] CRAN (R 3.6.0)
boot 1.3-23 2019-07-05 [1] CRAN (R 3.6.0)
broom 0.5.2 2019-04-07 [1] CRAN (R 3.6.0)
callr 3.3.1 2019-07-18 [1] CRAN (R 3.6.0)
cli 1.1.0 2019-03-19 [1] CRAN (R 3.6.0)
crayon 1.3.4 2017-09-16 [1] CRAN (R 3.6.0)
data.table 1.12.2 2019-04-07 [1] CRAN (R 3.6.0)
desc 1.2.0 2018-05-01 [1] CRAN (R 3.6.0)
devtools 2.1.0 2019-07-06 [1] CRAN (R 3.6.0)
digest 0.6.20 2019-07-04 [1] CRAN (R 3.6.0)
dplyr 0.8.3 2019-07-04 [1] CRAN (R 3.6.0)
fs 1.3.1 2019-05-06 [1] CRAN (R 3.6.0)
gdsfmt 1.20.0 2019-05-02 [1] Bioconductor
generics 0.0.2 2018-11-29 [1] CRAN (R 3.6.0)
GenomeInfoDb 1.20.0 2019-05-02 [1] Bioconductor
GenomeInfoDbData 1.2.1 2019-08-02 [1] Bioconductor
GenomicRanges 1.36.0 2019-05-02 [1] Bioconductor
glue 1.3.1 2019-03-12 [1] CRAN (R 3.6.0)
GWASExactHW 1.01 2013-01-05 [1] CRAN (R 3.6.0)
hms 0.5.0 2019-07-09 [1] CRAN (R 3.6.0)
IRanges 2.18.1 2019-05-31 [1] Bioconductor
jomo 2.6-9 2019-07-29 [1] CRAN (R 3.6.0)
lattice 0.20-38 2018-11-04 [1] CRAN (R 3.6.0)
lme4 1.1-21 2019-03-05 [1] CRAN (R 3.6.0)
logistf 1.23 2018-07-19 [1] CRAN (R 3.6.0)
magrittr 1.5 2014-11-22 [1] CRAN (R 3.6.0)
MASS 7.3-51.4 2019-03-31 [1] CRAN (R 3.6.0)
Matrix 1.2-17 2019-03-22 [1] CRAN (R 3.6.0)
memoise 1.1.0 2017-04-21 [1] CRAN (R 3.6.0)
mgcv 1.8-28 2019-03-21 [1] CRAN (R 3.6.0)
mice 3.6.0 2019-07-10 [1] CRAN (R 3.6.0)
minqa 1.2.4 2014-10-09 [1] CRAN (R 3.6.0)
mitml 0.3-7 2019-01-07 [1] CRAN (R 3.6.0)
nlme 3.1-140 2019-05-12 [1] CRAN (R 3.6.0)
nloptr 1.2.1 2018-10-03 [1] CRAN (R 3.6.0)
nnet 7.3-12 2016-02-02 [1] CRAN (R 3.6.0)
pan 1.6 2018-06-29 [1] CRAN (R 3.6.0)
pillar 1.4.2 2019-06-29 [1] CRAN (R 3.6.0)
pkgbuild 1.0.3 2019-03-20 [1] CRAN (R 3.6.0)
pkgconfig 2.0.2 2018-08-16 [1] CRAN (R 3.6.0)
pkgload 1.0.2 2018-10-29 [1] CRAN (R 3.6.0)
prettyunits 1.0.2 2015-07-13 [1] CRAN (R 3.6.0)
processx 3.4.1 2019-07-18 [1] CRAN (R 3.6.0)
ps 1.3.0 2018-12-21 [1] CRAN (R 3.6.0)
purrr 0.3.2 2019-03-15 [1] CRAN (R 3.6.0)
R6 2.4.0 2019-02-14 [1] CRAN (R 3.6.0)
radiator
1.1.2 2019-08-20 [1] Github (thierrygosselin/radiator@53a137d) Rcpp 1.0.2 2019-07-25 [1] CRAN (R 3.6.0)
RCurl 1.95-4.12 2019-03-04 [1] CRAN (R 3.6.0)
readr 1.3.1 2018-12-21 [1] CRAN (R 3.6.0)
remotes 2.1.0 2019-06-24 [1] CRAN (R 3.6.0)
rlang 0.4.0 2019-06-25 [1] CRAN (R 3.6.0)
rpart 4.1-15 2019-04-12 [1] CRAN (R 3.6.0)
rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.6.0)
rstudioapi 0.10 2019-03-19 [1] CRAN (R 3.6.0)
S4Vectors 0.22.0 2019-05-02 [1] Bioconductor
SeqArray 1.24.2 2019-07-12 [1] Bioconductor
SeqVarTools 1.22.0 2019-05-02 [1] Bioconductor
sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.6.0)
stringi 1.4.3 2019-03-12 [1] CRAN (R 3.6.0)
survival 2.44-1.1 2019-04-01 [1] CRAN (R 3.6.0)
testthat 2.2.1 2019-07-25 [1] CRAN (R 3.6.0)
tibble 2.1.3 2019-06-06 [1] CRAN (R 3.6.0)
tidyr 0.8.3 2019-03-01 [1] CRAN (R 3.6.0)
tidyselect 0.2.5 2018-10-11 [1] CRAN (R 3.6.0)
usethis 1.5.1 2019-07-04 [1] CRAN (R 3.6.0)
vctrs 0.2.0 2019-07-05 [1] CRAN (R 3.6.0)
withr 2.1.2 2018-03-15 [1] CRAN (R 3.6.0)
XVector 0.24.0 2019-05-02 [1] Bioconductor
yaml 2.2.0 2018-07-25 [1] CRAN (R 3.6.0)
zeallot 0.1.0 2018-01-28 [1] CRAN (R 3.6.0)
zlibbioc 1.30.0 2019-05-02 [1] Bioconductor

[1] /Library/Frameworks/R.framework/Versions/3.6/Resources/library

See attached subset

thierrygosselin commented 5 years ago

If your subset is a good representation of the original dataset, the problem is that your VCF as individual's that are missing all of their genotypes. My code was not prepared to handle this and the code throws an error when it tries to do a standard SD with base R code (it doesn't accept NA, NaN , etc).

The new commit should fix that.

I suggest looking at the file that start with individuals.qc.stats and highlight the bad samples.

SarahEmel commented 5 years ago

Thank you Thierry! It took me a while to get everything working with my dataset, but I am able to run filter_rad now without errors. We had a lot of variation in reads per individual and there was one bad sample that had to be removed in addition to other filtering.

thierrygosselin commented 5 years ago

ok glad I could help! lots of variation in reads per individuals can be lowered by...

Prior to sequencing (the most important steps):

bioinformatically: