thierrygosselin / radiator

RADseq Data Exploration, Manipulation and Visualization using R
https://thierrygosselin.github.io/radiator/
GNU General Public License v3.0
59 stars 23 forks source link

filter.hwe errors out with object 'pop' not found; 1 population and >100 individuals #102

Closed RGCheek closed 3 years ago

RGCheek commented 3 years ago

Hi Thierry

I used Radiator v. 1.1.6 previously but the whitelist it produced didn't seem to have the correct indexing information when I tried using it in STACKs (to get measure of Ho, pi, ect). Populations reads in the correct number of loci, but a huge drop off in variant sites. I'm now testing to see if v 1.1.8 produces a similar issue or if its something else on my end.

filter_rad fails at the filter_hwe step with:

################################################################################ ############################# radiator::filter_hwe ############################# ################################################################################ Execution date@time: 20201202@1843 Interactive mode: on Function call and arguments stored in: radiator_filter_hwe_args_20201202@1843.tsv Calibrating REF/ALT alleles... using tidy data frame of genotypes as input skipping all filters Summarizing data File written: genotypes.summary.tsv HWE analysis for pop: SCI Error in eval_tidy(xs[[j]], mask) : object 'pop' not found

To Reproduce radiator::filter_rad( data = "populations_123_indiv.snps.vcf", strata = "filtered_strata.issj.tsv", interactive.filter=T, output="vcf", parallel.core = 1)

sessionInfo() R version 4.0.2 (2020-06-22) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19041)

Matrix products: default

locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C LC_TIME=English_United States.1252

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] purrr_0.3.4 SeqArray_1.29.3 gdsfmt_1.24.1 radiator_1.1.8

loaded via a namespace (and not attached): [1] tidyselect_1.1.0 listenv_0.8.0 lattice_0.20-41 colorspace_1.4-1 vctrs_0.3.5 generics_0.1.0
[7] stats4_4.0.2 viridisLite_0.3.0 rlang_0.4.9 HardyWeinberg_1.6.8 pillar_1.4.7 glue_1.4.2
[13] BiocGenerics_0.34.0 GenomeInfoDbData_1.2.3 lifecycle_0.2.0 plyr_1.8.6 zlibbioc_1.34.0 Biostrings_2.56.0
[19] progressr_0.6.0 munsell_0.5.0 gtable_0.3.0 future_1.20.1 codetools_0.2-16 labeling_0.4.2
[25] UpSetR_1.4.0 IRanges_2.22.2 GenomeInfoDb_1.24.2 parallel_4.0.2 furrr_0.2.1 broom_0.7.2
[31] Rcpp_1.0.5 readr_1.4.0 carrier_0.1.0 backports_1.2.0 scales_1.1.1 BiocManager_1.30.10
[37] S4Vectors_0.26.1 XVector_0.28.0 truncnorm_1.0-8 parallelly_1.21.0 farver_2.0.3 gridExtra_2.3
[43] ggplot2_3.3.2 hms_0.5.3 digest_0.6.27 stringi_1.5.3 dplyr_1.0.2 GenomicRanges_1.40.0
[49] grid_4.0.2 tools_4.0.2 bitops_1.0-6 magrittr_2.0.1 RCurl_1.98-1.2 Rsolnp_1.16
[55] tibble_3.0.4 mice_3.12.0 crayon_1.3.4 SNPRelate_1.22.0 tidyr_1.1.2 pkgconfig_2.0.3
[61] ellipsis_0.3.1 data.table_1.13.2 rstudioapi_0.13 globals_0.14.0 R6_2.5.0 compiler_4.0.2

thierrygosselin commented 3 years ago

Dear RGCheek, sorry about the bug you're experiencing with radiator. I'll have a look at this today

Best, Thierry

thierrygosselin commented 3 years ago

you might be loosing some markers because they are duplicated and on the same strands, looking at your data I see that you have some

thierrygosselin commented 3 years ago

The problem is now fixed and will be in the next release today (v. 1.1.9)

thierrygosselin commented 3 years ago

stacks keeps changing the way it uses the information for CHROM, LOCUS, POS and COL. radiator adapts for this by checking against the version, but I keep all the info inside the MARKERS column in the tidy dataset (separated by __)