thierrygosselin / radiator

RADseq Data Exploration, Manipulation and Visualization using R
https://thierrygosselin.github.io/radiator/
GNU General Public License v3.0
58 stars 23 forks source link

genomic_converter Error #162

Closed BTBIIT closed 1 year ago

BTBIIT commented 2 years ago

Hi thierrygosselin

An error occurred while using your package, and we will contact you to inquire. I got the following message while executing the code below.

radiator::detect_genomic_format(data = "dp10q30mm100_biA.SNP.recode.vcf") test <- genomic_converter(data = "dp10q30mm100_biA.SNP.recode.vcf", strata = NULL, output = "structure", filename = "test.txt", parallel.core = parallel::detectCores() -1, verbose = T)

print message

Error in dplyr::mutate(): ! Problem while computing MISSING_PROP = round(...). Caused by error in .DynamicClusterCall(): ! One of the nodes produced an error: Can not open file

This was strange, so I traced the error down and returned something like this:

**> rlang::last_error() <error/dplyr:::mutate_error> Error in dplyr::mutate(): ! Problem while computing MISSING_PROP = round(...). Caused by error in .DynamicClusterCall(): ! One of the nodes produced an error: Can not open file 'C:\Users\pkh\Desktop\Radiator\04_radiator_genomic_converter_20220718@2038\radiator_20220718@2038.gds'

Backtrace:

  1. radiator::genomic_converter(...)
    1. SeqArray::seqMissing(gdsfile = gds, per.variant = FALSE, parallel = parallel.core)
    2. SeqArray::seqParallel(...)
    3. SeqArray::seqParallel(...)
    4. SeqArray:::.DynamicClusterCall(...)
    5. base::stop("One of the nodes produced an error: ", as.character(dv)) Run rlang::last_trace() to see the full context.

rlang::last_trace() <error/dplyr:::mutate_error> Error in dplyr::mutate(): ! Problem while computing MISSING_PROP = round(...). Caused by error in .DynamicClusterCall(): ! One of the nodes produced an error: Can not open file 'C:\Users\pkh\Desktop\Radiator\04_radiator_genomic_converter_20220718@2038\radiator_20220718@2038.gds'.

Backtrace: x

  1. +-radiator::genomic_converter(...)
  2. | -radiator::tidy_genomic_data(...)
  3. | -radiator::tidy_vcf(...)
  4. | -radiator::read_vcf(...)
  5. | +-... %$% info
  6. | -radiator::generate_id_stats(...)
  7. | -id.info %<>% ...
  8. +-base::with(., info)
  9. +-dplyr::mutate(...)
  10. +-dplyr:::mutate.data.frame(., MISSING_PROP = round(SeqArray::seqMissing(gdsfile = gds, per.variant = FALSE, parallel = parallel.core), digits))
  11. | -dplyr:::mutate_cols(.data, dplyr_quosures(...), caller_env = caller_env())
  12. | +-base::withCallingHandlers(...)
  13. | -mask$eval_all_mutate(quo)
  14. +-SeqArray::seqMissing(gdsfile = gds, per.variant = FALSE, parallel = parallel.core)
  15. | -SeqArray::seqParallel(...)
  16. | -SeqArray::seqParallel(...)
  17. | -SeqArray:::.DynamicClusterCall(...)
  18. | -base::stop("One of the nodes produced an error: ", as.character(dv))
  19. -base::.handleSimpleError(...)
  20. -dplyr (local) h(simpleError(msg, call))
  21. -rlang::abort(...)**

In addition, I tried to use another function radiator::tidy_vcf, but that also got an error.

data2 <- radiator::tidy_vcf(data = "dp10q30mm100_biA.SNP.recode.vcf",

  • filter.common.markers = FALSE,
  • verbose = TRUE) Execution date@time: 20220718@2047 Folder created: tidy_vcf_20220718@2047

Reading VCF...

Data summary: number of samples: 82 number of markers: 2383 Error in SeqArray::seqGetData(gdsfile = data, var.name = "$ref") : The GDS node "$ref" does not exist. In addition: Warning message: In if (is.stacks) { : the condition has length > 1 and only the first element will be used

Computation time, overall: 1 sec

I don't know why this is happening, and I knew that strata was optional, but I would like to ask if this is a required file. If not, I want to check if there is something wrong with my code. Here's the info on my computer: **> sessionInfo() R version 4.1.3 (2022-03-10) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19044)

Matrix products: default

locale: [1] LC_COLLATE=Korean_Korea.949 LC_CTYPE=Korean_Korea.949 LC_MONETARY=Korean_Korea.949 LC_NUMERIC=C
[5] LC_TIME=Korean_Korea.949

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] future_1.26.1 strataG_2.5.01 SeqArray_1.34.0 gdsfmt_1.30.0 radiator_1.2.2 adegenet_2.1.7 ade4_1.7-19

loaded via a namespace (and not attached): [1] nlme_3.1-158 bitops_1.0-7 fs_1.5.2 usethis_2.1.6 bit64_4.0.5
[6] devtools_2.4.3 UpSetR_1.4.0 GenomeInfoDb_1.30.1 tools_4.1.3 utf8_1.2.2
[11] R6_2.5.1 vegan_2.6-2 DBI_1.1.3 BiocGenerics_0.40.0 mgcv_1.8-40
[16] colorspace_2.0-3 permute_0.9-7 withr_2.5.0 gridExtra_2.3 tidyselect_1.1.2
[21] prettyunits_1.1.1 processx_3.6.1 phangorn_2.9.0 bit_4.0.4 compiler_4.1.3
[26] cli_3.3.0 scales_1.2.0 readr_2.1.2 quadprog_1.5-8 callr_3.7.1
[31] stringr_1.4.0 digest_0.6.29 XVector_0.34.0 pkgconfig_2.0.3 htmltools_0.5.2
[36] parallelly_1.32.0 sessioninfo_1.2.2 fastmap_1.1.0 rlang_1.0.3 rstudioapi_0.13
[41] shiny_1.7.1 generics_0.1.3 vroom_1.5.7 dplyr_1.0.9 RCurl_1.98-1.7
[46] magrittr_2.0.3 GenomeInfoDbData_1.2.7 apex_1.0.4 Matrix_1.4-1 Rcpp_1.0.9
[51] munsell_0.5.0 S4Vectors_0.32.4 fansi_1.0.3 ape_5.6-2 lifecycle_1.0.1
[56] stringi_1.7.6 MASS_7.3-57 zlibbioc_1.40.0 pkgbuild_1.3.1 plyr_1.8.7
[61] grid_4.1.3 listenv_0.8.0 parallel_4.1.3 promises_1.2.0.1 crayon_1.5.1
[66] lattice_0.20-45 Biostrings_2.62.0 splines_4.1.3 hms_1.1.1 ps_1.7.1
[71] pillar_1.8.0 igraph_1.3.2 GenomicRanges_1.46.1 seqinr_4.2-16 reshape2_1.4.4
[76] codetools_0.2-18 stats4_4.1.3 pkgload_1.3.0 fastmatch_1.1-3 glue_1.6.2
[81] data.table_1.14.2 remotes_2.4.2 BiocManager_1.30.18 tzdb_0.3.0 vctrs_0.4.1
[86] httpuv_1.6.5 gtable_0.3.0 purrr_0.3.4 assertthat_0.2.1 cachem_1.0.6
[91] ggplot2_3.3.6 mime_0.12 xtable_1.8-4 later_1.3.0 tibble_3.1.7
[96] memoise_2.0.1 IRanges_2.28.0 cluster_2.1.3 globals_0.15.1 ellipsis_0.3.2**

Thanks in advance for your help.

pdimens commented 2 years ago

I can confirm this behavior in converting vcf to fineradstructure

Generating individual stats...
Error in `dplyr::mutate()`:
! Problem while computing `MISSING_PROP = round(...)`.
Caused by error in `.DynamicClusterCall()`:
! One of the nodes produced an error: Can not open file 'C:\Users\pdime\Omega\USM PhD\Projects\Active\Blackfin Tuna\Analyses\thinned\inputfiles\01_radiator_genomic_converter_20220720@1703\radiator_20220720@1703.gds'. The process cannot access the file because it is being used by another process.
Run `rlang::last_error()` to see where the error occurred.
There were 12 warnings (use warnings() to see them)
pdimens commented 2 years ago

It's an issue with parallelization bugging out (on windows at least, not tested elsewhere).

Solution: add parallel.core = 1L to the genomic_converter call

genomic_converter(data,  parallel.core = 1L, args...)
thierrygosselin commented 1 year ago

some Windows architecture do seem to have a problem with parallelization

Sadly, it's beyond my interest to fix those, because I don't have machines to run Windows (physically or in the cloud) to run my tests, but more importantly the time to do that. Again sorry.

thierrygosselin commented 1 year ago

@BTBIIT I will reopen the issue if you send me the VCF dp10q30mm100_biA.SNP.recode.vcf of subsample of it so that I can check that it's not an issue with the code when running Linux or macOS...

thanks for reporting