thibautjombart / adegenet

adegenet: a R package for the multivariate analysis of genetic markers
165 stars 64 forks source link

genind2df() doesn't work for 'rupica' when 'oneColPerAll = TRUE' #320

Open dfriend21 opened 2 years ago

dfriend21 commented 2 years ago

Running this code produces an error:

library(adegenet)
data(rupica)
genind2df(rupica, oneColPerAll = TRUE)
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE,  : 
  arguments imply differing number of rows: 333, 335, 332, 334
In addition: Warning messages:
1: In matrix(unlist(e), ncol = x@ploidy[1], byrow = TRUE) :
  data length [665] is not a sub-multiple or multiple of the number of rows [333]
2: In matrix(unlist(e), ncol = x@ploidy[1], byrow = TRUE) :
  data length [669] is not a sub-multiple or multiple of the number of rows [335]
3: In matrix(unlist(e), ncol = x@ploidy[1], byrow = TRUE) :
  data length [667] is not a sub-multiple or multiple of the number of rows [334]
4: In matrix(unlist(e), ncol = x@ploidy[1], byrow = TRUE) :
  data length [665] is not a sub-multiple or multiple of the number of rows [333]
session info ``` ─ Session info ──────────────────────────────────── setting value version R version 4.1.2 (2021-11-01) os macOS Big Sur 11.2 system x86_64, darwin17.0 ui RStudio language (EN) collate en_US.UTF-8 ctype en_US.UTF-8 tz America/Los_Angeles date 2022-01-22 rstudio 2021.09.2+382 Ghost Orchid (desktop) pandoc NA ─ Packages ──────────────────────────────────────── package * version date (UTC) lib source ade4 * 1.7-18 2021-09-16 [1] CRAN (R 4.1.0) adegenet * 2.1.5 2021-10-09 [1] CRAN (R 4.1.0) ape 5.6-1 2022-01-07 [1] CRAN (R 4.1.2) bench 1.1.2 2021-11-30 [1] CRAN (R 4.1.0) cli 3.1.1 2022-01-20 [1] CRAN (R 4.1.2) cluster 2.1.2 2021-04-17 [1] CRAN (R 4.1.2) colorspace 2.0-2 2021-06-24 [1] CRAN (R 4.1.0) crayon 1.4.2 2021-10-29 [1] CRAN (R 4.1.0) digest 0.6.29 2021-12-01 [1] CRAN (R 4.1.0) dplyr 1.0.7 2021-06-18 [1] CRAN (R 4.1.0) ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.1.0) fansi 1.0.2 2022-01-14 [1] CRAN (R 4.1.2) fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.1.0) generics 0.1.1 2021-10-25 [1] CRAN (R 4.1.0) ggplot2 3.3.5 2021-06-25 [1] CRAN (R 4.1.0) glue 1.6.0 2021-12-17 [1] CRAN (R 4.1.0) gtable 0.3.0 2019-03-25 [1] CRAN (R 4.1.0) hierfstat 0.5-10 2021-11-17 [1] CRAN (R 4.1.0) htmltools 0.5.2 2021-08-25 [1] CRAN (R 4.1.0) httpuv 1.6.5 2022-01-05 [1] CRAN (R 4.1.2) igraph 1.2.11 2022-01-04 [1] CRAN (R 4.1.2) later 1.3.0 2021-08-18 [1] CRAN (R 4.1.0) lattice 0.20-45 2021-09-22 [1] CRAN (R 4.1.2) lifecycle 1.0.1 2021-09-24 [1] CRAN (R 4.1.0) magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.1.0) MASS 7.3-54 2021-05-03 [1] CRAN (R 4.1.2) Matrix 1.3-4 2021-06-01 [1] CRAN (R 4.1.2) mgcv 1.8-38 2021-10-06 [1] CRAN (R 4.1.2) mime 0.12 2021-09-28 [1] CRAN (R 4.1.0) munsell 0.5.0 2018-06-12 [1] CRAN (R 4.1.0) nlme 3.1-153 2021-09-07 [1] CRAN (R 4.1.2) pegas 1.1 2021-12-16 [1] CRAN (R 4.1.0) permute 0.9-5 2019-03-12 [1] CRAN (R 4.1.0) pillar 1.6.4 2021-10-18 [1] CRAN (R 4.1.0) pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.1.0) plyr 1.8.6 2020-03-03 [1] CRAN (R 4.1.0) promises 1.2.0.1 2021-02-11 [1] CRAN (R 4.1.0) purrr 0.3.4 2020-04-17 [1] CRAN (R 4.1.0) R6 2.5.1 2021-08-19 [1] CRAN (R 4.1.0) Rcpp 1.0.8 2022-01-13 [1] CRAN (R 4.1.2) reshape2 1.4.4 2020-04-09 [1] CRAN (R 4.1.0) rlang 0.4.12 2021-10-18 [1] CRAN (R 4.1.0) scales 1.1.1 2020-05-11 [1] CRAN (R 4.1.0) seqinr 4.2-8 2021-06-09 [1] CRAN (R 4.1.0) sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.1.0) shiny 1.7.1 2021-10-02 [1] CRAN (R 4.1.0) stringi 1.7.6 2021-11-29 [1] CRAN (R 4.1.0) stringr 1.4.0 2019-02-10 [1] CRAN (R 4.1.0) tibble 3.1.6 2021-11-07 [1] CRAN (R 4.1.0) tidyselect 1.1.1 2021-04-30 [1] CRAN (R 4.1.0) utf8 1.2.2 2021-07-24 [1] CRAN (R 4.1.0) vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.1.0) vegan 2.5-7 2020-11-28 [1] CRAN (R 4.1.0) xtable 1.8-4 2019-04-21 [1] CRAN (R 4.1.0) [1] /Library/Frameworks/R.framework/Versions/4.1/Resources/library ─────────────────────────────────────────────────── ```

I noticed that there are two closed issues that might be relevant: #18 and #192.

ntakebay commented 2 years ago

I encountered a similar problem when one allele of a diploid genotype is unkown. For example, with 3 individuals, genotypes at a locus may be A/A, T, A/T, then it causes the problem. Note that for the second individual, only 1 allele was recovered, and the other allele isn't known (e.g. failed sequencing). In the code below, I added a section which deals with this problem.

I'm attaching a text file, which correct this problem. The fix was based on the version 2.1.5 of adegenet.

The additional argument, rm.incompleteGeno controls what to do when there are missing alleles for a genotype. By default (rm.incompleteGeno=F), the genotypes of the second individual in the example above will become T/NA. If you set the option to true, it will remove the incomplete genotypes, and replace it with NA/NA (for diploid).

I also noticed that character string of "NA" is used instead of NA with oneColPerAll=T. So I added another fix to convert them to real NA in the code below.

With this modification, the sample code by @dfriend21 gives the expected result without the error.

genind2df.txt