thierrygosselin / radiator

RADseq Data Exploration, Manipulation and Visualization using R
https://thierrygosselin.github.io/radiator/
GNU General Public License v3.0
59 stars 23 forks source link

DArT Counts Error: filter_rad (MARKERS %in% markers) #186

Closed nsc-2024 closed 7 months ago

nsc-2024 commented 7 months ago

Hi Thierry, We are reviewing DArT Count data for Lake Sturgeon. We contacted you in January 2024 about "filter_rad" erroring at the "filter_individuals" stage. This was fixed in radiator 1.3.0.

With the latest push, we now receive a similar error when operating filter_rad noting: "Generating statistics ✔ Missing genotypes [141ms] ✔ Heterozygosity [472ms] Error in dplyr::filter(): ℹ In argument: MARKERS %in% markers. Caused by error: ! object 'MARKERS' not found Run rlang::last_trace() to see where the error occurred. ✖ Coverage ... [1.3s]".

However, when we generate the gds and run "filter_individuals" immediately after, the function works. We have tried the following troubleshooting:

Data files have already been emailed to you and are in the OneDrive Folder (2 Thierry Shared Files)

Cheers, Tina

Here's the error: DUPSRead_tidy <- radiator::read_dart(data = "Report_DAci24-8986_2_moreOrders_SNPcount_2_CSedit_working_all_dupsremoved.csv", strata= "(5)strata_dart_sturgeon_20240427_mod.tsv", tidy.dart = TRUE) ##we have also done tidy.dart = FALSE

filtered_rad<- radiator::filter_rad(data = DUPSRead_tidy, strata = NULL, interactive.filter = TRUE, filter.hwe = TRUE, filter.common.markers = FALSE, verbose = TRUE)

################################################################################ ############################# radiator::filter_rad ############################# ################################################################################ Execution date@time: 20240429@1342 Folder created: filter_rad_20240429@1342 Function call and arguments stored in: radiator_filter_rad_args_20240429@1342.tsv File written: random.seed (542770)
Filters parameters file generated: filters_parameters_20240429@1342.tsv ################################################################################ #################### radiator::filter_dart_reproducibility ##################### ################################################################################ Execution date@time: 20240429@1342 Function call and arguments stored in: radiator_filter_dart_reproducibility_args_20240429@1342.tsv

Interactive mode: on 2 steps to visualize and filter the data based on reproducibility: Step 1. Visualization Step 2. Choose the filtering threshold

File written: dart_reproducibility_stats.tsv
File written: dart_reproducibility_boxplot_20240429@1342.pdf Generating helper table... Files written: helper tables and plots

Step 2. Filtering markers based on markers reproducibility

Do you still want to blacklist markers? (y/n): n

Computation time, overall: 3 sec #################### completed filter_dart_reproducibility ##################### ################################################################################ ######################### radiator::filter_monomorphic ######################### ################################################################################ Execution date@time: 20240429@1342 Function call and arguments stored in: radiator_filter_monomorphic_args_20240429@1342.tsv File written: whitelist.polymorphic.markers_20240429@1342.tsv
################################### RESULTS ####################################

Filter monomorphic markers Number of individuals / strata / chrom / locus / SNP: Before: 2194 / 26 / 1 / 26333 / 30769 Blacklisted: 0 / 0 / 0 / 0 / 0 After: 2194 / 26 / 1 / 26333 / 30769

Computation time, overall: 1 sec ######################### completed filter_monomorphic ######################### ################################################################################ ######################### radiator::filter_individuals ######################### ################################################################################ Execution date@time: 20240429@1342 Function call and arguments stored in: radiator_filter_individuals_args_20240429@1342.tsv Interactive mode: on

Step 1. Visualization Step 2. Missingness Step 3. Heterozygosity Step 4. Coverage (if available)

Step 1. Visualization of samples QC

Generating statistics ✔ Missing genotypes [141ms] ✔ Heterozygosity [472ms] Error in dplyr::filter(): ℹ In argument: MARKERS %in% markers. Caused by error: ! object 'MARKERS' not found Run rlang::last_trace() to see where the error occurred. ✖ Coverage ... [1.3s]

Computation time, overall: 2 sec ######################### completed filter_individuals #########################

Computation time, overall: 5 sec ############################# completed filter_rad #############################

Here is the session info: ─ Session info ───────────────────────────────────────────────────── setting value version R version 4.4.0 (2024-04-24) os macOS Sonoma 14.4.1 system aarch64, darwin20 ui RStudio language (EN) collate en_US.UTF-8 ctype en_US.UTF-8 tz America/Winnipeg date 2024-04-29 rstudio 2023.12.1+402 Ocean Storm (desktop) pandoc NA

─ Packages ───────────────────────────────────────────────────────── package version date (UTC) lib source ade4 1.7-22 2023-02-06 [1] CRAN (R 4.4.0) adegenet 2.1.10 2023-01-26 [1] CRAN (R 4.4.0) amap 0.8-19 2022-10-28 [1] CRAN (R 4.4.0) ape 5.8 2024-04-11 [1] CRAN (R 4.4.0) backports 1.4.1 2021-12-13 [1] CRAN (R 4.4.0) BiocGenerics 0.49.1 2024-04-22 [1] Bioconductor 3.19 (R 4.4.0) BiocManager 1.30.22 2023-08-08 [1] CRAN (R 4.4.0) Biostrings 2.70.3 2024-03-13 [1] Bioconductor 3.18 (R 4.4.0) bit 4.0.5 2022-11-15 [1] CRAN (R 4.4.0) bit64 4.0.5 2020-08-30 [1] CRAN (R 4.4.0) bitops 1.0-7 2021-04-24 [1] CRAN (R 4.4.0) boot 1.3-30 2024-02-26 [1] CRAN (R 4.4.0) broom 1.0.5 2023-06-09 [1] CRAN (R 4.4.0) cachem 1.0.8 2023-05-01 [1] CRAN (R 4.4.0) carrier 0.1.1 2023-04-28 [1] CRAN (R 4.4.0) cli 3.6.2 2023-12-11 [1] CRAN (R 4.4.0) cluster 2.1.6 2023-12-01 [1] CRAN (R 4.4.0) codetools 0.2-20 2024-03-31 [1] CRAN (R 4.4.0) colorspace 2.1-0 2023-01-23 [1] CRAN (R 4.4.0) conflicted 1.2.0 2023-02-01 [1] CRAN (R 4.4.0) crayon 1.5.2 2022-09-29 [1] CRAN (R 4.4.0) data.table 1.15.4 2024-03-30 [1] CRAN (R 4.4.0) devtools 2.4.5 2022-10-11 [1] CRAN (R 4.4.0) digest 0.6.35 2024-03-11 [1] CRAN (R 4.4.0) dplyr 1.1.4 2023-11-17 [1] CRAN (R 4.4.0) ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.4.0) fansi 1.0.6 2023-12-08 [1] CRAN (R 4.4.0) farver 2.1.1 2022-07-06 [1] CRAN (R 4.4.0) fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.4.0) foreach 1.5.2 2022-02-02 [1] CRAN (R 4.4.0) fs 1.6.4 2024-04-25 [1] CRAN (R 4.4.0) fst 0.9.8 2022-02-08 [1] CRAN (R 4.4.0) fstcore 0.9.18 2023-12-02 [1] CRAN (R 4.4.0) future 1.33.2 2024-03-26 [1] CRAN (R 4.4.0) gdsfmt 1.39.3 2024-04-22 [1] Bioconductor 3.19 (R 4.4.0) generics 0.1.3 2022-07-05 [1] CRAN (R 4.4.0) GenomeInfoDb 1.38.8 2024-03-15 [1] Bioconductor 3.18 (R 4.4.0) GenomeInfoDbData 1.2.11 2024-04-27 [1] Bioconductor GenomicRanges 1.54.1 2023-10-29 [1] Bioconductor ggplot2 3.5.1 2024-04-23 [1] CRAN (R 4.4.0) glmnet 4.1-8 2023-08-22 [1] CRAN (R 4.4.0) globals 0.16.3 2024-03-08 [1] CRAN (R 4.4.0) glue 1.7.0 2024-01-09 [1] CRAN (R 4.4.0) gridExtra 2.3 2017-09-09 [1] CRAN (R 4.4.0) gtable 0.3.5 2024-04-22 [1] CRAN (R 4.4.0) gtools 3.9.5 2023-11-20 [1] CRAN (R 4.4.0) HardyWeinberg 1.7.8 2024-04-06 [1] CRAN (R 4.4.0) hms 1.1.3 2023-03-21 [1] CRAN (R 4.4.0) htmltools 0.5.8.1 2024-04-04 [1] CRAN (R 4.4.0) htmlwidgets 1.6.4 2023-12-06 [1] CRAN (R 4.4.0) httpuv 1.6.15 2024-03-26 [1] CRAN (R 4.4.0) igraph 2.0.3 2024-03-13 [1] CRAN (R 4.4.0) IRanges 2.36.0 2023-10-24 [1] Bioconductor iterators 1.0.14 2022-02-05 [1] CRAN (R 4.4.0) jomo 2.7-6 2023-04-15 [1] CRAN (R 4.4.0) labeling 0.4.3 2023-08-29 [1] CRAN (R 4.4.0) later 1.3.2 2023-12-06 [1] CRAN (R 4.4.0) lattice 0.22-6 2024-03-20 [1] CRAN (R 4.4.0) lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.4.0) listenv 0.9.1 2024-01-29 [1] CRAN (R 4.4.0) lme4 1.1-35.3 2024-04-16 [1] CRAN (R 4.4.0) magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.4.0) MASS 7.3-60.2 2024-04-24 [1] local Matrix 1.7-0 2024-03-22 [1] CRAN (R 4.4.0) matrixStats 1.3.0 2024-04-11 [1] CRAN (R 4.4.0) memoise 2.0.1 2021-11-26 [1] CRAN (R 4.4.0) mgcv 1.9-1 2023-12-21 [1] CRAN (R 4.4.0) mice 3.16.0 2023-06-05 [1] CRAN (R 4.4.0) mime 0.12 2021-09-28 [1] CRAN (R 4.4.0) miniUI 0.1.1.1 2018-05-18 [1] CRAN (R 4.4.0) minqa 1.2.6 2023-09-11 [1] CRAN (R 4.4.0) mitml 0.4-5 2023-03-08 [1] CRAN (R 4.4.0) munsell 0.5.1 2024-04-01 [1] CRAN (R 4.4.0) nlme 3.1-164 2023-11-27 [1] CRAN (R 4.4.0) nloptr 2.0.3 2022-05-26 [1] CRAN (R 4.4.0) nnet 7.3-19 2023-05-03 [1] CRAN (R 4.4.0) OutFLANK 0.2 2024-04-27 [1] Github (whitlock/OutFLANK@e502e82) pan 1.9 2023-12-07 [1] CRAN (R 4.4.0) parallelly 1.37.1 2024-02-29 [1] CRAN (R 4.4.0) permute 0.9-7 2022-01-27 [1] CRAN (R 4.4.0) pillar 1.9.0 2023-03-22 [1] CRAN (R 4.4.0) pkgbuild 1.4.4 2024-03-17 [1] CRAN (R 4.4.0) pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.4.0) pkgload 1.3.4 2024-01-16 [1] CRAN (R 4.4.0) plyr 1.8.9 2023-10-02 [1] CRAN (R 4.4.0) profvis 0.3.8 2023-05-02 [1] CRAN (R 4.4.0) promises 1.3.0 2024-04-05 [1] CRAN (R 4.4.0) purrr 1.0.2 2023-08-10 [1] CRAN (R 4.4.0) qvalue 2.35.0 2024-04-22 [1] Bioconductor 3.19 (R 4.4.0) R6 2.5.1 2021-08-19 [1] CRAN (R 4.4.0) radiator 1.3.1 2024-04-27 [1] Github (thierrygosselin/radiator@cae66c8) ragg 1.3.0 2024-03-13 [1] CRAN (R 4.4.0) RColorBrewer 1.1-3 2022-04-03 [1] CRAN (R 4.4.0) Rcpp 1.0.12 2024-01-09 [1] CRAN (R 4.4.0) RCurl 1.98-1.14 2024-01-09 [1] CRAN (R 4.4.0) readr 2.1.5 2024-01-10 [1] CRAN (R 4.4.0) remotes 2.5.0 2024-03-17 [1] CRAN (R 4.4.0) reshape2 1.4.4 2020-04-09 [1] CRAN (R 4.4.0) rlang 1.1.3 2024-01-10 [1] CRAN (R 4.4.0) rpart 4.1.23 2023-12-05 [1] CRAN (R 4.4.0) Rsolnp 1.16 2015-12-28 [1] CRAN (R 4.4.0) rstudioapi 0.16.0 2024-03-24 [1] CRAN (R 4.4.0) S4Vectors 0.40.2 2023-11-23 [1] Bioconductor 3.18 (R 4.4.0) scales 1.3.0 2023-11-28 [1] CRAN (R 4.4.0) SeqArray 1.43.8 2024-04-27 [1] Github (zhengxwen/SeqArray@16bba1e) seqinr 4.2-36 2023-12-08 [1] CRAN (R 4.4.0) sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.4.0) shape 1.4.6.1 2024-02-23 [1] CRAN (R 4.4.0) shiny 1.8.1.9000 2024-04-27 [1] Github (rstudio/shiny@950c630) SNPRelate 1.37.5 2024-04-22 [1] Bioconductor 3.19 (R 4.4.0) stockR 1.0.76 2023-04-26 [1] CRAN (R 4.4.0) stringi 1.8.3 2023-12-11 [1] CRAN (R 4.4.0) stringr 1.5.1 2023-11-14 [1] CRAN (R 4.4.0) survival 3.6-4 2024-04-24 [1] CRAN (R 4.4.0) systemfonts 1.0.6 2024-03-07 [1] CRAN (R 4.4.0) textshaping 0.3.7 2023-10-09 [1] CRAN (R 4.4.0) tibble 3.2.1 2023-03-20 [1] CRAN (R 4.4.0) tidyr 1.3.1 2024-01-24 [1] CRAN (R 4.4.0) tidyselect 1.2.1 2024-03-11 [1] CRAN (R 4.4.0) truncnorm 1.0-9 2023-03-20 [1] CRAN (R 4.4.0) tzdb 0.4.0 2023-05-12 [1] CRAN (R 4.4.0) UpSetR 1.4.0 2019-05-22 [1] CRAN (R 4.4.0) urlchecker 1.0.1 2021-11-30 [1] CRAN (R 4.4.0) usethis * 2.2.3 2024-02-19 [1] CRAN (R 4.4.0) utf8 1.2.4 2023-10-22 [1] CRAN (R 4.4.0) vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.4.0) vegan 2.6-4 2022-10-11 [1] CRAN (R 4.4.0) viridisLite 0.4.2 2023-05-02 [1] CRAN (R 4.4.0) vroom 1.6.5 2023-12-05 [1] CRAN (R 4.4.0) withr 3.0.0 2024-01-16 [1] CRAN (R 4.4.0) xtable 1.8-4 2019-04-21 [1] CRAN (R 4.4.0) XVector 0.42.0 2023-10-24 [1] Bioconductor zlibbioc 1.48.2 2024-03-13 [1] Bioconductor 3.18 (R 4.4.0)

thierrygosselin commented 7 months ago

Welcome to GitHub Tina I'll have a look at this later today Best Thierry

thierrygosselin commented 7 months ago

updates.... I'm able to reproduce your error with the provided data... I'll have a fix

thierrygosselin commented 7 months ago
test1 <- radiator::read_dart(
  data = "Report_DAci24-8986_2_moreOrders_SNPcount_2_CSedit_working_all_dupsremoved.csv",
  strata= "(5)strata_dart_sturgeon_20240427_mod.tsv"
  )

works

Reading DArT file...
Number of blacklisted samples: 239
DArT SNP format: alleles coverage in 2 Rows counts
fstcore package v0.9.18
(OpenMP detected, using 56 threads)
Generating genotypes and calibrating REF/ALT alleles...
Number of markers recalibrated based on counts of allele read depth: 5398
Generating GDS...
File written: radiator_20240430@1040.gds.rad

Number of chrom: 1
Number of locus: 28804
Number of SNPs: 33736
Number of strata: 27
Number of individuals: 2525

Number of ind/strata:
  GFR = 417
RAL = 49
RAR = 49
NMW = 5
ENG = 50
CAR = 138
WHD = 19
TET = 56
BOU = 27
PPE = 20
SFC = 88
SFB = 38
SFA = 24
DSS = 92
NUM = 125
NUT = 125
PIG = 2
PFR = 815
DOR = 35
SEM = 24
LDB = 87
DSP = 57
ASS = 30
EBC = 20
CHR = 20
GUL = 69
STE = 44

Number of duplicate id: 0
thierrygosselin commented 7 months ago
test2 <- radiator::read_dart(
  data = "Report_DAci24-8986_2_moreOrders_SNPcount_2_CSedit_working_all_dupsremoved.csv",
  strata= "(5)strata_dart_sturgeon_20240427_mod.tsv",
  tidy.dart = TRUE
)

works

thierrygosselin commented 7 months ago
test3 <- radiator::filter_rad(
  data = "Report_DAci24-8986_2_moreOrders_SNPcount_2_CSedit_working_all_dupsremoved.csv",
  strata= "(5)strata_dart_sturgeon_20240427_mod.tsv"
)
Generating statistics
✔ Missing genotypes [3.1s]
✔ Heterozygosity [1.5s]
Error in `dplyr::filter()`:
ℹ In argument: `MARKERS %in% markers`.
Caused by error:
! object 'MARKERS' not found
Run `rlang::last_trace()` to see where the error occurred.
✖ Coverage ... [2.9s]
thierrygosselin commented 7 months ago

Version 1.3.2 should work with your dataset

test this:

data <- radiator::filter_rad(
  data = "Report_DAci24-8986_2_moreOrders_SNPcount_2_CSedit_working_all_dupsremoved.csv",
  strata= "(5)strata_dart_sturgeon_20240427_mod.tsv"
)

Getting this at the end is normal, part of it will be fixed but it doesn't change anything. It's code cosmetic.

############################# completed filter_rad #############################
Warning messages:
1: In ggplot2::scale_y_log10(labels = scales::number_format(), oob = scales::squish_infinite) :
  log-10 transformation introduced infinite values.
2: There was 1 warning in `dplyr::mutate()`.
ℹ In argument: `WHITELISTED_MARKERS = purrr::map_int(...)`.
Caused by warning:
! Using one column matrices in `filter()` was deprecated in dplyr
  1.1.0.
ℹ Please use one dimensional logical vectors instead.
ℹ The deprecated feature was likely used in the radiator package.
  Please report the issue at
  <https://github.com/thierrygosselin/radiator/issues>.
This warning is displayed once every 8 hours.
Call `lifecycle::last_lifecycle_warnings()` to see where this warning
was generated. 
3: Unknown or uninitialised column: `STRATA`. 
4: Unknown or uninitialised column: `STRATA`. 
5: Unknown or uninitialised column: `STRATA`.