thierrygosselin / radiator

RADseq Data Exploration, Manipulation and Visualization using R
https://thierrygosselin.github.io/radiator/
GNU General Public License v3.0
58 stars 23 forks source link

tidy to HZAR missing POP_ID column issues #151

Closed emilyostrow closed 1 year ago

emilyostrow commented 2 years ago

Hi Thierry,

I am ultimately trying to convert a VCF to an HZAR format. I have imported the VCF successfully, but I am having issues getting radiator to export the tidy dataset to the HZAR format. The problem is surrounding a missing POP_ID column, but I am not sure at which point I am supposed to add it. I tried doing the write_hzar function directly from creating the tidyVCF object and after creating a dummy POP_ID column and I got different errors for each.

For the following code, the error was: Calibrating REF/ALT alleles... Error in write_hzar():
0s! Populations in distances file doesn't match populations in input file Run rlang::last_error() to see where the error occurred. Warning messages: 1: Unknown or uninitialised column: POP_ID. 2: Unknown or uninitialised column: POP_ID.

tidyVCF <- tidy_vcf(data="OptimalFiltered.vcf")
hzar_filtered <- write_hzar(
  tidyVCF,
  distances = "distances2.txt",
  filename = NULL,
  parallel.core = parallel::detectCores() - 1
)

For the code with distances =NULL, the error was: Calibrating REF/ALT alleles... Error in distinct_prepare(): ! distinct() must use existing variables. x POP_ID not found in .data. Run rlang::last_error() to see where the error occurred. Warning message: Unknown or uninitialised column: POP_ID.

hzar_filtered <- write_hzar(
  tidyVCF,
  distances = NULL,
  filename = NULL,
  parallel.core = parallel::detectCores() - 1
)

For the following code with the dummy POP_ID column, the error was: Calibrating REF/ALT alleles... Error in stop_vctrs(): ! Names must be unique. x These names are duplicated:

I am clearly doing something wrong here surrounding the POP_ID column, but I am not sure how to fix it. Any advice would be greatly appreciated. Thank you.

Session Info: setting value version R version 4.1.1 (2021-08-10) os macOS Big Sur 11.2.1 system x86_64, darwin17.0 ui RStudio language (EN) collate en_US.UTF-8 ctype en_US.UTF-8 tz America/Chicago date 2022-02-02 rstudio 1.4.1717 Juliet Rose (desktop) pandoc NA

─ Packages ───────────────────────────────────────────────────────── package version date (UTC) lib source ade4 1.7-18 2021-09-16 [1] CRAN (R 4.1.0) adegenet 2.1.5 2021-10-09 [1] CRAN (R 4.1.0) ape 5.6-1 2022-01-07 [1] CRAN (R 4.1.2) assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.1.0) BiocGenerics 0.40.0 2021-10-26 [1] Bioconductor Biostrings 2.62.0 2021-10-26 [1] Bioconductor bit 4.0.4 2020-08-04 [1] CRAN (R 4.1.0) bit64 4.0.5 2020-08-30 [1] CRAN (R 4.1.0) bitops 1.0-7 2021-04-24 [1] CRAN (R 4.1.0) brio 1.1.3 2021-11-30 [1] CRAN (R 4.1.0) cachem 1.0.6 2021-08-19 [1] CRAN (R 4.1.0) callr 3.7.0 2021-04-20 [1] CRAN (R 4.1.0) cli 3.1.1 2022-01-20 [1] CRAN (R 4.1.2) cluster 2.1.2 2021-04-17 [1] CRAN (R 4.1.1) coda 0.19-4 2020-09-30 [1] CRAN (R 4.1.0) codetools 0.2-18 2020-11-04 [1] CRAN (R 4.1.1) colorspace 2.0-2 2021-06-24 [1] CRAN (R 4.1.0) combinat 0.0-8 2012-10-29 [1] CRAN (R 4.1.0) crayon 1.4.2 2021-10-29 [1] CRAN (R 4.1.0) data.table 1.14.2 2021-09-27 [1] CRAN (R 4.1.0) DBI 1.1.2 2021-12-20 [1] CRAN (R 4.1.0) desc 1.4.0 2021-09-28 [1] CRAN (R 4.1.0) devtools 2.4.2 2021-06-07 [1] CRAN (R 4.1.0) digest 0.6.29 2021-12-01 [1] CRAN (R 4.1.0) dplyr 1.0.7 2021-06-18 [1] CRAN (R 4.1.0) ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.1.0) fansi 1.0.2 2022-01-14 [1] CRAN (R 4.1.2) farver 2.1.0 2021-02-28 [1] CRAN (R 4.1.0) fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.1.0) foreach 1.5.1 2020-10-15 [1] CRAN (R 4.1.0) fs 1.5.2 2021-12-08 [1] CRAN (R 4.1.0) fst 0.9.4 2020-08-27 [1] CRAN (R 4.1.0) gdata 2.18.0 2017-06-06 [1] CRAN (R 4.1.0) gdsfmt 1.30.0 2021-10-26 [1] Bioconductor generics 0.1.2 2022-01-31 [1] CRAN (R 4.1.2) genetics 1.3.8.1.3 2021-03-01 [1] CRAN (R 4.1.0) GenomeInfoDb 1.30.0 2021-10-26 [1] Bioconductor GenomeInfoDbData 1.2.7 2021-11-02 [1] Bioconductor GenomicRanges 1.46.0 2021-10-26 [1] Bioconductor ggplot2 3.3.5 2021-06-25 [1] CRAN (R 4.1.0) glue 1.6.1 2022-01-22 [1] CRAN (R 4.1.2) gridExtra 2.3 2017-09-09 [1] CRAN (R 4.1.0) gtable 0.3.0 2019-03-25 [1] CRAN (R 4.1.0) gtools 3.9.2 2021-06-06 [1] CRAN (R 4.1.0) hms 1.1.1 2021-09-26 [1] CRAN (R 4.1.0) htmltools 0.5.2 2021-08-25 [1] CRAN (R 4.1.0) httpuv 1.6.5 2022-01-05 [1] CRAN (R 4.1.2) hzar 0.2-5 2013-09-23 [1] CRAN (R 4.1.0) igraph 1.2.11 2022-01-04 [1] CRAN (R 4.1.2) introgress 1.2.3 2012-10-29 [1] CRAN (R 4.1.0) IRanges 2.28.0 2021-10-26 [1] Bioconductor iterators 1.0.13 2020-10-15 [1] CRAN (R 4.1.0) labeling 0.4.2 2020-10-20 [1] CRAN (R 4.1.0) later 1.3.0 2021-08-18 [1] CRAN (R 4.1.0) lattice 0.20-45 2021-09-22 [1] CRAN (R 4.1.0) lifecycle 1.0.1 2021-09-24 [1] CRAN (R 4.1.0) magrittr 2.0.2 2022-01-26 [1] CRAN (R 4.1.2) MASS 7.3-54 2021-05-03 [1] CRAN (R 4.1.1) Matrix 1.3-4 2021-06-01 [1] CRAN (R 4.1.1) MatrixModels 0.5-0 2021-03-02 [1] CRAN (R 4.1.0) mcmc 0.9-7 2020-03-21 [1] CRAN (R 4.1.0) MCMCpack 1.6-0 2021-10-06 [1] CRAN (R 4.1.0) memoise 2.0.0 2021-01-26 [1] CRAN (R 4.1.0) memuse 4.2-1 2021-10-20 [1] CRAN (R 4.1.0) mgcv 1.8-38 2021-10-06 [1] CRAN (R 4.1.0) mime 0.12 2021-09-28 [1] CRAN (R 4.1.0) munsell 0.5.0 2018-06-12 [1] CRAN (R 4.1.0) mvtnorm 1.1-3 2021-10-08 [1] CRAN (R 4.1.0) nlme 3.1-153 2021-09-07 [1] CRAN (R 4.1.0) nnet 7.3-16 2021-05-03 [1] CRAN (R 4.1.1) permute 0.9-7 2022-01-27 [1] CRAN (R 4.1.2) pillar 1.7.0 2022-02-01 [1] CRAN (R 4.1.1) pinfsc50 1.2.0 2020-06-03 [1] CRAN (R 4.1.0) pkgbuild 1.2.0 2020-12-15 [1] CRAN (R 4.1.0) pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.1.0) pkgload 1.2.4 2021-11-30 [1] CRAN (R 4.1.0) plyr 1.8.6 2020-03-03 [1] CRAN (R 4.1.0) prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.1.0) processx 3.5.2 2021-04-30 [1] CRAN (R 4.1.0) promises 1.2.0.1 2021-02-11 [1] CRAN (R 4.1.0) ps 1.6.0 2021-02-28 [1] CRAN (R 4.1.0) purrr 0.3.4 2020-04-17 [1] CRAN (R 4.1.0) quantreg 5.87 2022-01-26 [1] CRAN (R 4.1.2) R6 2.5.1 2021-08-19 [1] CRAN (R 4.1.0) radiator 1.2.2 2022-02-03 [1] Github (thierrygosselin/radiator@6efdf14) RColorBrewer 1.1-2 2014-12-07 [1] CRAN (R 4.1.0) Rcpp 1.0.8 2022-01-13 [1] CRAN (R 4.1.2) RCurl 1.98-1.5 2021-09-17 [1] CRAN (R 4.1.0) readr 2.1.2 2022-01-30 [1] CRAN (R 4.1.2) remotes 2.4.1 2021-09-29 [1] CRAN (R 4.1.0) reshape2 1.4.4 2020-04-09 [1] CRAN (R 4.1.0) rlang 1.0.0 2022-01-26 [1] CRAN (R 4.1.2) rprojroot 2.0.2 2020-11-15 [1] CRAN (R 4.1.0) rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.1.0) S4Vectors 0.32.0 2021-10-26 [1] Bioconductor scales 1.1.1 2020-05-11 [1] CRAN (R 4.1.0) SeqArray 1.34.0 2021-10-26 [1] Bioconductor seqinr 4.2-8 2021-06-09 [1] CRAN (R 4.1.0) sessioninfo 1.2.0 2021-10-31 [1] CRAN (R 4.1.0) shiny 1.7.1 2021-10-02 [1] CRAN (R 4.1.0) SparseM 1.81 2021-02-18 [1] CRAN (R 4.1.0) stringi 1.7.6 2021-11-29 [1] CRAN (R 4.1.0) stringr 1.4.0 2019-02-10 [1] CRAN (R 4.1.0) testthat 3.1.2 2022-01-20 [1] CRAN (R 4.1.2) tibble 3.1.6 2021-11-07 [1] CRAN (R 4.1.0) tidyr 1.1.4 2021-09-27 [1] CRAN (R 4.1.0) tidyselect 1.1.1 2021-04-30 [1] CRAN (R 4.1.0) tzdb 0.2.0 2021-10-27 [1] CRAN (R 4.1.0) UpSetR 1.4.0 2019-05-22 [1] CRAN (R 4.1.0) usethis 2.1.3 2021-10-27 [1] CRAN (R 4.1.0) utf8 1.2.2 2021-07-24 [1] CRAN (R 4.1.0) vcfR 1.12.0 2020-09-01 [1] CRAN (R 4.1.0) vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.1.0) vegan 2.5-7 2020-11-28 [1] CRAN (R 4.1.0) viridisLite 0.4.0 2021-04-13 [1] CRAN (R 4.1.0) vroom 1.5.7 2021-11-30 [1] CRAN (R 4.1.0) withr 2.4.3 2021-11-30 [1] CRAN (R 4.1.0) xtable 1.8-4 2019-04-21 [1] CRAN (R 4.1.0) XVector 0.34.0 2021-10-26 [1] Bioconductor zlibbioc 1.40.0 2021-10-26 [1] Bioconductor

[1] /Library/Frameworks/R.framework/Versions/4.1/Resources/library

thierrygosselin commented 1 year ago

re-open the issue following guidelines if this is still relevant with latest release. Otherwise, you can use stacks populations module to get a HZAR file from a vcf.