thierrygosselin / radiator

RADseq Data Exploration, Manipulation and Visualization using R
https://thierrygosselin.github.io/radiator/
GNU General Public License v3.0
58 stars 23 forks source link

read_vcf file access error with parallel.core = 1L argument #178

Closed ac-harris closed 6 months ago

ac-harris commented 1 year ago

Hi, Thierry--

I'm trying to import a VCF from Stacks v2.57 into radiator, and I'm running into an issue with the dreaded "The process cannot access the file because it is being used by another process" error when the heterozygosity is being calculated. I dug into old issues on github and tried running read_vcf with the parallel.core = 1L argument (and also parallel.core = 1 for good measure), but the function still gets hung up in the same spot with the same error each time. Not quite sure where to go from here.

For reproducibility, I included the function/args, the complete error message and traceback, the session info, and a subset of the vcf and strata. I uploaded the subsetted vcf as a txt file since github doesn't allow uploads of vcfs. I'd appreciate any insight you can provide!

Function: data <- read_vcf("populations.snps.vcf", strata = "strata_sex.txt", parallel.core = 1L)

Error: Error in dplyr::mutate(): ℹ In argument: HET_OBS = round(markers_het(gds), 6). Caused by error in .DynamicClusterCall(): ! One of the nodes produced an error: Can not open file 'S:\Eagle Fish Genetics Lab\EFGL Genetic Projects\Burbot\2023 Burbot RADseq\Analyses\sex_marker\sexy_markers\read_vcf_20230414@1139\01_import_gds\radiator_20230414@1139.gds'. The process cannot access the file because it is being used by another process. Run rlang::last_trace() to see where the error occurred. ✖ Heterozygosity [7.8s]

Traceback: ▆

  1. ├─radiator::read_vcf(...)
  2. │ └─radiator::generate_stats(...)
  3. │ └─m.info %<>% ...
  4. ├─dplyr::mutate(...)
  5. ├─dplyr:::mutate.data.frame(...)
  6. │ └─dplyr:::mutate_cols(.data, dplyr_quosures(...), by)
  7. │ ├─base::withCallingHandlers(...)
  8. │ └─dplyr:::mutate_col(dots[[i]], data, mask, new_columns)
  9. │ └─mask$eval_all_mutate(quo)
    1. │ └─dplyr (local) eval()
    2. ├─radiator::markers_het(gds)
    3. │ └─SeqArray::seqApply(...)
    4. │ └─SeqArray::seqParallel(...)
    5. │ └─SeqArray::seqParallel(...)
    6. │ └─SeqArray:::.DynamicClusterCall(...)
    7. │ └─base::stop("One of the nodes produced an error: ", as.character(dv))
    8. └─base::.handleSimpleError(...)
    9. └─dplyr (local) h(simpleError(msg, call))
    10. └─rlang::abort(message, class = error_class, parent = parent, call = error_call) Session info: ─ Session info ───────────────────────────────────────── setting value version R version 4.2.2 (2022-10-31 ucrt) os Windows 10 x64 (build 19044) system x86_64, mingw32 ui RStudio language (EN) collate English_United States.utf8 ctype English_United States.utf8 tz America/Denver date 2023-04-14 rstudio 2022.07.2+576 Spotted Wakerobin (desktop) pandoc NA

─ Packages ───────────────────────────────────────────── package version date (UTC) lib source BiocGenerics 0.44.0 2022-11-01 [1] Bioconductor Biostrings 2.66.0 2022-11-01 [1] Bioconductor bit 4.0.5 2022-11-15 [1] CRAN (R 4.2.3) bit64 4.0.5 2020-08-30 [1] CRAN (R 4.2.2) bitops 1.0-7 2021-04-24 [1] CRAN (R 4.2.0) cachem 1.0.7 2023-02-24 [1] CRAN (R 4.2.3) callr 3.7.3 2022-11-02 [1] CRAN (R 4.2.2) cli 3.6.1 2023-03-23 [1] CRAN (R 4.2.3) colorspace 2.1-0 2023-01-23 [1] CRAN (R 4.2.3) crayon 1.5.2 2022-09-29 [1] CRAN (R 4.2.2) devtools 2.4.5 2022-10-11 [1] CRAN (R 4.2.2) digest 0.6.31 2022-12-11 [1] CRAN (R 4.2.3) dplyr 1.1.1 2023-03-22 [1] CRAN (R 4.2.3) ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.2.2) fansi 1.0.4 2023-01-22 [1] CRAN (R 4.2.3) farver 2.1.1 2022-07-06 [1] CRAN (R 4.2.2) fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.2.3) forcats 1.0.0 2023-01-29 [1] CRAN (R 4.2.3) fs 1.6.1 2023-02-06 [1] CRAN (R 4.2.3) gdsfmt 1.34.1 2023-03-31 [1] Bioconductor generics 0.1.3 2022-07-05 [1] CRAN (R 4.2.2) GenomeInfoDb 1.34.9 2023-02-02 [1] Bioconductor GenomeInfoDbData 1.2.9 2023-04-13 [1] Bioconductor GenomicRanges 1.50.2 2022-12-27 [1] Bioconductor ggplot2 3.4.2 2023-04-03 [1] CRAN (R 4.2.3) glue 1.6.2 2022-02-24 [1] CRAN (R 4.2.2) gridExtra 2.3 2017-09-09 [1] CRAN (R 4.2.2) gtable 0.3.3 2023-03-21 [1] CRAN (R 4.2.3) hms 1.1.3 2023-03-21 [1] CRAN (R 4.2.3) htmltools 0.5.5 2023-03-23 [1] CRAN (R 4.2.3) htmlwidgets 1.6.2 2023-03-17 [1] CRAN (R 4.2.3) httpuv 1.6.9 2023-02-14 [1] CRAN (R 4.2.3) IRanges 2.32.0 2022-11-01 [1] Bioconductor labeling 0.4.2 2020-10-20 [1] CRAN (R 4.2.0) later 1.3.0 2021-08-18 [1] CRAN (R 4.2.2) lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.2.2) lubridate 1.9.2 2023-02-10 [1] CRAN (R 4.2.3) magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.2.3) memoise 2.0.1 2021-11-26 [1] CRAN (R 4.2.2) mime 0.12 2021-09-28 [1] CRAN (R 4.2.0) miniUI 0.1.1.1 2018-05-18 [1] CRAN (R 4.2.2) munsell 0.5.0 2018-06-12 [1] CRAN (R 4.2.2) pillar 1.9.0 2023-03-22 [1] CRAN (R 4.2.3) pkgbuild 1.4.0 2022-11-27 [1] CRAN (R 4.2.3) pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.2.2) pkgload 1.3.2 2022-11-16 [1] CRAN (R 4.2.3) plyr 1.8.8 2022-11-11 [1] CRAN (R 4.2.2) prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.2.2) processx 3.8.0 2022-10-26 [1] CRAN (R 4.2.2) profvis 0.3.7 2020-11-02 [1] CRAN (R 4.2.2) promises 1.2.0.1 2021-02-11 [1] CRAN (R 4.2.2) ps 1.7.4 2023-04-02 [1] CRAN (R 4.2.3) purrr 1.0.1 2023-01-10 [1] CRAN (R 4.2.3) R6 2.5.1 2021-08-19 [1] CRAN (R 4.2.2) radiator 1.2.8 2023-04-13 [1] Github (thierrygosselin/radiator@d2442e5) Rcpp 1.0.10 2023-01-22 [1] CRAN (R 4.2.3) RCurl 1.98-1.12 2023-03-27 [1] CRAN (R 4.2.3) readr 2.1.4 2023-02-10 [1] CRAN (R 4.2.3) remotes 2.4.2 2021-11-30 [1] CRAN (R 4.2.2) rlang 1.1.0 2023-03-14 [1] CRAN (R 4.2.3) rstudioapi 0.14 2022-08-22 [1] CRAN (R 4.2.2) S4Vectors 0.36.2 2023-02-26 [1] Bioconductor scales 1.2.1 2022-08-20 [1] CRAN (R 4.2.2) SeqArray 1.38.0 2022-11-01 [1] Bioconductor sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.2.2) shiny 1.7.4 2022-12-15 [1] CRAN (R 4.2.3) stringi 1.7.12 2023-01-11 [1] CRAN (R 4.2.2) stringr 1.5.0 2022-12-02 [1] CRAN (R 4.2.3) tibble 3.2.1 2023-03-20 [1] CRAN (R 4.2.3) tidyr 1.3.0 2023-01-24 [1] CRAN (R 4.2.3) tidyselect 1.2.0 2022-10-10 [1] CRAN (R 4.2.2) tidyverse 2.0.0 2023-02-22 [1] CRAN (R 4.2.3) timechange 0.2.0 2023-01-11 [1] CRAN (R 4.2.3) tzdb 0.3.0 2022-03-28 [1] CRAN (R 4.2.2) UpSetR 1.4.0 2019-05-22 [1] CRAN (R 4.2.2) urlchecker 1.0.1 2021-11-30 [1] CRAN (R 4.2.2) usethis 2.1.6 2022-05-25 [1] CRAN (R 4.2.2) utf8 1.2.3 2023-01-31 [1] CRAN (R 4.2.3) vctrs 0.6.1 2023-03-22 [1] CRAN (R 4.2.3) vroom 1.6.1 2023-01-22 [1] CRAN (R 4.2.3) withr 2.5.0 2022-03-03 [1] CRAN (R 4.2.2) xtable 1.8-4 2019-04-21 [1] CRAN (R 4.2.2) XVector 0.38.0 2022-11-01 [1] Bioconductor zlibbioc 1.44.0 2022-11-01 [1] Bioconductor

[1] C:/Users/aharris/AppData/Local/R/win-library/4.2 [2] C:/Program Files/R/R-4.2.2/library

strata_sex_sub.txt populations.snps_sub.txt

sanfilippog commented 1 year ago

I'm having this same issue and have been unable to use radiator with SNP vcf files because I can't get past it. Any resolution for you yet?

thierrygosselin commented 6 months ago

Sorry for the long delay, I'm unable to reproduce your error. Try the new version and if you're still having an error, re-open the issue and provide a bigger file over email. sorry about that

greavess commented 5 months ago

Hi, I have the same error on Windows, installed from github today (April 8). I am not sure why the files are being accessed incorrectly - I think it is a bug in SeqArray, but I think I found why setting the parallel.core = 1 is not helping. I confirmed in task manager that more than one process is being spawned even when parallel.core = 1.

In the extract_coverage function in gds.R, there is the following call:

SeqArray::seqApply(gdsfile = gds, var.name = "$dosage_alt", 
    FUN = function(x) sum(x == 1, na.rm = TRUE)/sum(!is.na(x)), 
    margin = "by.variant", as.is = "double", parallel = TRUE)

The SeqArray manual says for the parallel argument

parallel FALSE (serial processing), TRUE (multicore processing), numeric value or other value; parallel is passed to the argument cl in seqParallel, see seqParallel for more details.

so I think parallel.core is not being passed into seqApply and instead being left to autodetection.

I hope this helps, I am not sure how to test it.