thierrygosselin / radiator

RADseq Data Exploration, Manipulation and Visualization using R
GNU General Public License v3.0
58 stars 23 forks source link

ERROR with detect_duplicate_genome + tidy format obtained from genomic_converter #179

Closed GabryS3 closed 6 months ago

GabryS3 commented 1 year ago

Hi Thierry, I have 2 issues:

  1. I want to understand how the tidy format is obtained (to understand if I get a correct conversion into tidy of my genlight object)
  2. I get an error when using detect_duplicate_genomes() function --> Error in value_vars(value.var, names(data)) : value.var values [n] are not found in 'data'.

1). Question 1 - tidy format: I just want to understand if the function "genomic_converter" is working properly now that I installed the most recent version of Radiator (1.2.8).

I used "genomic_converter" to convert my "genlight" dataset --> into a "tidy" dataset. Code: < My_dataset_TIDY = genomic_converter( My_dataset, # class = genlight object strata = NULL, output = "tidy", filename = "My_dataset_TIDY", parallel.core = parallel::detectCores() - 1, verbose = TRUE)

However, I cannot understand if the conversion was correct or whether there are issues.

First of all, did I needed to include a "STRATA" file? I did not include any. My genlight object was obtained through the package dartR.

Second, my concern is that the genotypes are not properly coded. For example:

TIDY format: individual 1, chrom1_locus 1002_42_A_T_42 --> REF = T, ALT = A, GT_BIN = 2 GENLIGHT format: individual 1, chrom1_locus 1002_42_A_T_42 --> genotype = 0 (= homozygous for REF allele)

TIDY format: individual 2, chrom1_locus 1002_42_A_T_42 --> REF = T, ALT = A, GT_BIN = 1 GENLIGHT format: individual 2, chrom1_locus 1002_42_A_T_42 --> genotype = 1 (heterozygous)

TIDY format: individual 3, chrom1_locus 1002_42_A_T_42 --> REF = T, ALT = A, GT_BIN = 0 GENLIGHT format: individual 3, chrom1_locus 1002_42_A_T_42 --> genotype = 2 (= homozygous for ALT allele = SNP)

What is the GT_BIN & How is it coded? Is this genotype coding transformation reported above correct? From my understanding, it seems that GT_BIN codes the genotype in an opposite way compared to the genlight object, correct?

2). Question 2 - error in "detect_duplcate_genomes()" After converting my genlight object into tidy format with the code above (with genomic_converter() ) --> I then treid to use "detect_duplicate_genomes" on my tidy dataset. However, I get the following error: Code: <My_dataset_duplicate_genomes = detect_duplicate_genomes( data = "My_dataset_TIDY.rad", interactive.filter = TRUE, detect.duplicate.genomes = TRUE, dup.threshold = 0, distance.method = "manhattan", genome = FALSE, threshold.common.markers = NULL, blacklist.duplicates = FALSE, parallel.core = parallel::detectCores() - 1, verbose = TRUE)

################################################################################ ###################### radiator::detect_duplicate_genomes ###################### ################################################################################ Execution date@time: 20230503@1746 Folder created: -604_detect_duplicate_genomes_20230503@1746 Function call and arguments stored in a file File written: radiator_detect_duplicate_genomes_args_20230503@1746.tsv File written: random.seed (247023) Filters parameters file generated: filters_parameters_20230503@1746.tsv Preparing data for analysis Calculating manhattan distances between individuals... Error in value_vars(value.var, names(data)) : value.var values [n] are not found in 'data'. In addition: There were 50 or more warnings (use warnings() to see the first 50)

Computation time, overall: 36 sec ###################### completed detect_duplicate_genomes ######################

What does this error "Error in value_vars(value.var, names(data)) : value.var values [n] are not found in 'data'." mean?

I would really appreciate your help, as I have been trying to use this function for a while now, always incurring in some issue on the way... Thanks a lot! Best, Gabriella

devtools session info: devtools::session_info() ─ Session info ────────────────────────────────────────────────────────────────────────────────────────────────────────── setting value version R version 4.2.1 (2022-06-23 ucrt) os Windows 10 x64 (build 19044) system x86_64, mingw32 ui RStudio language (EN) collate English_Australia.utf8 ctype English_Australia.utf8 tz Australia/Brisbane date 2023-05-03 rstudio 2022.07.0+548 Spotted Wakerobin (desktop) pandoc NA

─ Packages ────────────────────────────────────────────────────────────────────────────────────────────────────────────── ! package version date (UTC) lib source ade4 1.7-19 2022-04-19 [1] CRAN (R 4.2.1) adegenet 2.1.7 2022-06-06 [1] CRAN (R 4.2.1) amap 0.8-19 2022-10-28 [1] CRAN (R 4.2.1) ape 5.6-2 2022-03-02 [1] CRAN (R 4.2.1) assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.2.1) backports 1.4.1 2021-12-13 [1] CRAN (R 4.2.0) BiocGenerics 0.42.0 2022-04-26 [1] Bioconductor BiocManager 1.30.18 2022-05-18 [1] CRAN (R 4.2.1) bit 4.0.4 2020-08-04 [1] CRAN (R 4.2.1) bit64 4.0.5 2020-08-30 [1] CRAN (R 4.2.1) broom 1.0.0 2022-07-01 [1] CRAN (R 4.2.1) cachem 1.0.6 2021-08-19 [1] CRAN (R 4.2.1) calibrate 1.7.7 2020-06-19 [1] CRAN (R 4.2.1) callr 3.7.1 2022-07-13 [1] CRAN (R 4.2.1) cellranger 1.1.0 2016-07-27 [1] CRAN (R 4.2.1) cli 3.4.1 2022-09-23 [1] CRAN (R 4.2.2) cluster 2.1.3 2022-03-28 [2] CRAN (R 4.2.1) codetools 0.2-18 2020-11-04 [2] CRAN (R 4.2.1) colorspace 2.0-3 2022-02-21 [1] CRAN (R 4.2.1) combinat 0.0-8 2012-10-29 [1] CRAN (R 4.2.0) crayon 1.5.1 2022-03-26 [1] CRAN (R 4.2.1) VP dartR 2.9.4 2022-06-05 [?] CRAN (R 4.2.1) (on disk 2.0.4) 1.0.2 2022-11-16 [1] CRAN (R 4.2.2) data.table 1.14.2 2021-09-27 [1] CRAN (R 4.2.1) DBI 1.1.3 2022-06-18 [1] CRAN (R 4.2.1) dbplyr 2.2.1 2022-06-27 [1] CRAN (R 4.2.1) devtools 2.4.3 2021-11-30 [1] CRAN (R 4.2.1) digest 0.6.29 2021-12-01 [1] CRAN (R 4.2.1) dismo 1.3-5 2021-10-11 [1] CRAN (R 4.2.1) doParallel 1.0.17 2022-02-07 [1] CRAN (R 4.2.1) dotCall64 1.0-1 2021-02-11 [1] CRAN (R 4.2.1) dplyr 1.0.9 2022-04-28 [1] CRAN (R 4.2.1) ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.2.1) fansi 1.0.3 2022-03-24 [1] CRAN (R 4.2.1) fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.2.1) fields 14.0 2022-07-05 [1] CRAN (R 4.2.1) forcats 0.5.1 2021-01-27 [1] CRAN (R 4.2.1) foreach 1.5.2 2022-02-02 [1] CRAN (R 4.2.1) fs 1.5.2 2021-12-08 [1] CRAN (R 4.2.1) fst 0.9.8 2022-02-08 [1] CRAN (R 4.2.1) fstcore 0.9.12 2022-03-23 [1] CRAN (R 4.2.1) gap 1.2.3-6 2022-05-13 [1] CRAN (R 4.2.1) gap.datasets 0.0.5 2022-05-09 [1] CRAN (R 4.2.0) gdata 2022-05-10 [1] CRAN (R 4.2.1) gdistance 1.3-6 2020-06-29 [1] CRAN (R 4.2.1) gdsfmt 1.32.0 2022-04-26 [1] Bioconductor generics 0.1.3 2022-07-05 [1] CRAN (R 4.2.1) genetics 2021-03-01 [1] CRAN (R 4.2.1) GGally 2.1.2 2021-06-21 [1] CRAN (R 4.2.1) ggplot2 3.4.0 2022-11-04 [1] CRAN (R 4.2.2) glue 1.6.2 2022-02-24 [1] CRAN (R 4.2.1) gridExtra 2.3 2017-09-09 [1] CRAN (R 4.2.1) gtable 0.3.0 2019-03-25 [1] CRAN (R 4.2.1) gtools 3.9.3 2022-07-11 [1] CRAN (R 4.2.1) haven 2.5.0 2022-04-15 [1] CRAN (R 4.2.1) hms 1.1.1 2021-09-26 [1] CRAN (R 4.2.1) htmltools 0.5.2 2021-08-25 [1] CRAN (R 4.2.1) httpuv 1.6.5 2022-01-05 [1] CRAN (R 4.2.1) httr 1.4.3 2022-05-04 [1] CRAN (R 4.2.1) igraph 1.3.2 2022-06-13 [1] CRAN (R 4.2.1) iterators 1.0.14 2022-02-05 [1] CRAN (R 4.2.1) jsonlite 1.8.0 2022-02-22 [1] CRAN (R 4.2.1) knitr 1.39 2022-04-26 [1] CRAN (R 4.2.1) later 1.3.0 2021-08-18 [1] CRAN (R 4.2.1) lattice 0.20-45 2021-09-22 [2] CRAN (R 4.2.1) LEA 3.8.0 2022-04-26 [1] Bioconductor lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.2.2) lubridate 1.8.0 2021-10-07 [1] CRAN (R 4.2.1) magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.2.1) maps 3.4.0 2021-09-25 [1] CRAN (R 4.2.1) MASS 7.3-57 2022-04-22 [2] CRAN (R 4.2.1) Matrix 1.4-1 2022-03-23 [2] CRAN (R 4.2.1) memoise 2.0.1 2021-11-26 [1] CRAN (R 4.2.1) mgcv 1.8-40 2022-03-29 [2] CRAN (R 4.2.1) mime 0.12 2021-09-28 [1] CRAN (R 4.2.0) mmod 1.3.3 2017-04-06 [1] CRAN (R 4.2.1) modelr 0.1.8 2020-05-19 [1] CRAN (R 4.2.1) munsell 0.5.0 2018-06-12 [1] CRAN (R 4.2.1) mvtnorm 1.1-3 2021-10-08 [1] CRAN (R 4.2.0) naniar 1.0.0 2023-02-02 [1] CRAN (R 4.2.3) nlme 3.1-157 2022-03-25 [2] CRAN (R 4.2.1) OutFLANK 0.2 2022-07-18 [1] Github (whitlock/OutFLANK@e502e82) patchwork 1.1.1 2020-12-17 [1] CRAN (R 4.2.1) pegas 1.1 2021-12-16 [1] CRAN (R 4.2.1) permute 0.9-7 2022-01-27 [1] CRAN (R 4.2.1) pillar 1.7.0 2022-02-01 [1] CRAN (R 4.2.1) pinfsc50 1.2.0 2020-06-03 [1] CRAN (R 4.2.0) pkgbuild 1.3.1 2021-12-20 [1] CRAN (R 4.2.1) pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.2.1) pkgload 1.3.0 2022-06-27 [1] CRAN (R 4.2.1) plotrix 3.8-2 2021-09-08 [1] CRAN (R 4.2.0) plyr 1.8.7 2022-03-24 [1] CRAN (R 4.2.1) png 0.1-7 2013-12-03 [1] CRAN (R 4.2.0) PopGenReport 3.0.7 2022-05-27 [1] CRAN (R 4.2.1) prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.2.1) processx 3.7.0 2022-07-07 [1] CRAN (R 4.2.1) promises 2021-02-11 [1] CRAN (R 4.2.1) ps 1.7.1 2022-06-18 [1] CRAN (R 4.2.1) purrr 0.3.4 2020-04-17 [1] CRAN (R 4.2.1) qvalue 2.28.0 2022-04-26 [1] Bioconductor R.methodsS3 1.8.2 2022-06-13 [1] CRAN (R 4.2.0) R.oo 1.25.0 2022-06-12 [1] CRAN (R 4.2.0) R.utils 2.12.0 2022-06-28 [1] CRAN (R 4.2.1) R6 2.5.1 2021-08-19 [1] CRAN (R 4.2.1) VP radiator 1.2.8 2022-07-16 [?] Github (thierrygosselin/radiator@6efdf14) (on disk 1.2.2) raster 3.5-21 2022-06-27 [1] CRAN (R 4.2.1) RColorBrewer 1.1-3 2022-04-03 [1] CRAN (R 4.2.0) Rcpp 1.0.9 2022-07-08 [1] CRAN (R 4.2.1) readr 2.1.2 2022-01-30 [1] CRAN (R 4.2.1) readxl 1.4.0 2022-03-28 [1] CRAN (R 4.2.1) remotes 2.4.2 2021-11-30 [1] CRAN (R 4.2.1) reprex 2.0.1 2021-08-05 [1] CRAN (R 4.2.1) reshape 0.8.9 2022-04-12 [1] CRAN (R 4.2.1) reshape2 1.4.4 2020-04-09 [1] CRAN (R 4.2.1) rgdal 1.5-32 2022-05-09 [1] CRAN (R 4.2.1) RgoogleMaps 2020-02-12 [1] CRAN (R 4.2.1) rlang 1.0.6 2022-09-24 [1] CRAN (R 4.2.2) rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.2.1) rvest 1.0.2 2021-10-16 [1] CRAN (R 4.2.1) scales 1.2.0 2022-04-13 [1] CRAN (R 4.2.1) seqinr 4.2-16 2022-05-19 [1] CRAN (R 4.2.1) sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.2.1) shiny 1.7.1 2021-10-02 [1] CRAN (R 4.2.1) SNPRelate 1.30.1 2022-05-15 [1] Bioconductor snpStats 1.46.0 2022-04-26 [1] Bioconductor sp 1.5-0 2022-06-05 [1] CRAN (R 4.2.1) spam 2.9-0 2022-07-11 [1] CRAN (R 4.2.1) spida2 0.2.1 2023-04-26 [1] Github (gmonette/spida2@48e562d) StAMPP 1.6.3 2021-08-08 [1] CRAN (R 4.2.1) stockR 1.0.74 2020-03-04 [1] CRAN (R 4.2.1) stringi 1.7.8 2022-07-11 [1] CRAN (R 4.2.1) stringr 1.4.1 2022-08-20 [1] CRAN (R 4.2.1) survival 3.3-1 2022-03-03 [2] CRAN (R 4.2.1) terra 1.5-34 2022-06-09 [1] CRAN (R 4.2.1) tibble 3.1.7 2022-05-03 [1] CRAN (R 4.2.1) tidyr 1.2.0 2022-02-01 [1] CRAN (R 4.2.1) tidyselect 1.1.2 2022-02-21 [1] CRAN (R 4.2.1) tidyverse 1.3.1 2021-04-15 [1] CRAN (R 4.2.1) tzdb 0.3.0 2022-03-28 [1] CRAN (R 4.2.1) usethis 2.1.6 2022-05-25 [1] CRAN (R 4.2.1) utf8 1.2.2 2021-07-24 [1] CRAN (R 4.2.1) vcfR 1.12.0 2020-09-01 [1] CRAN (R 4.2.1) vctrs 0.5.1 2022-11-16 [1] CRAN (R 4.2.2) vegan 2.6-2 2022-04-17 [1] CRAN (R 4.2.1) versions 0.3 2016-09-01 [1] CRAN (R 4.2.0) viridis 0.6.2 2021-10-13 [1] CRAN (R 4.2.1) viridisLite 0.4.0 2021-04-13 [1] CRAN (R 4.2.1) visdat 0.6.0 2023-02-02 [1] CRAN (R 4.2.3) vroom 1.5.7 2021-11-30 [1] CRAN (R 4.2.1) withr 2.5.0 2022-03-03 [1] CRAN (R 4.2.1) xfun 0.31 2022-05-10 [1] CRAN (R 4.2.1) xml2 1.3.3 2021-11-30 [1] CRAN (R 4.2.1) xtable 1.8-4 2019-04-21 [1] CRAN (R 4.2.1) zlibbioc 1.42.0 2022-04-26 [1] Bioconductor

[1] C:/Users/scatag/AppData/Local/R/win-library/4.2 [2] C:/Program Files/R/R-4.2.1/library

V ── Loaded and on-disk version mismatch. P ── Loaded and on-disk path mismatch.


thierrygosselin commented 6 months ago

@GabryS3 Sorry for the long delay

First of all, did I needed to include a "STRATA" file? I did not include any. My genlight object was obtained through the package dartR.

I haven't used dartR in a long long time and no longer sure if it stores the population map or stratification (STRATA) for individuals. The file is not required because certain format stores the info. If not found it will make the assumption that you have just one population

thierrygosselin commented 6 months ago

@GabryS3 GT_BIN format What is the GT_BIN & How is it coded? Is this genotype coding transformation reported above correct? From my understanding, it seems that GT_BIN codes the genotype in an opposite way compared to the genlight object, correct?

It will probably be change for another name very soon. For bi-allelic dataset it's the dosage of the alternate allele (the number of alternate allele).

I cannot be certain of your problem without a thorough look at your file.

thierrygosselin commented 6 months ago

Question 2 - error in "detect_duplcate_genomes()"

I'm unable to reproduce or help you with your problem without the files

Re-open the issue if it's still relevant to you Best