grunwaldlab / poppr

🌶 An R package for genetic analysis of populations with mixed (clonal/sexual) reproduction
https://grunwaldlab.github.io/poppr
68 stars 26 forks source link

genind2genalex() produces all zero genotypes with some SNP data. #231

Closed zkamvar closed 3 years ago

zkamvar commented 3 years ago

From https://groups.google.com/g/poppr/c/FfDlDWArQsA/m/UWp8h6ISDQAJ

genind2genalex() is producing all zeroes for output when it should produce SNP data. It's clear the error lies with the genind2genalex() and not df2genind().

  suppressPackageStartupMessages(library("poppr"))
  tmp <- tempfile(fileext = ".csv")
  x <- new("genind", tab = structure(c(NA, 2L, 2L, 2L, 2L, NA, 0L, 0L,
0L, 0L, NA, 2L, 2L, 2L, 2L, NA, 0L, 0L, 0L, 0L, 1L, 1L, 2L, 2L,
1L, 1L, 1L, 0L, 0L, 1L), .Dim = 5:6, .Dimnames = list(c("TT056001.trim",
"TT060001.trim", "TT062001.trim", "TT063001.trim", "TT064001.trim"
), c("loc87_pos30.A", "loc87_pos30.G", "loc106_pos31.G", "loc106_pos31.T",
"loc345_pos27.G", "loc345_pos27.T"))), loc.fac = structure(c(1L,
1L, 2L, 2L, 3L, 3L), .Label = c("loc87_pos30", "loc106_pos31",
"loc345_pos27"), class = "factor"), loc.n.all = c(loc87_pos30 = 2L,
loc106_pos31 = 2L, loc345_pos27 = 2L), all.names = list(loc87_pos30 = c("A",
"G"), loc106_pos31 = c("G", "T"), loc345_pos27 = c("G", "T")),
    ploidy = c(2L, 2L, 2L, 2L, 2L), type = "codom", other = list(),
    call = .local(x = x, i = i, j = j, loc = ..1, drop = drop),
    pop = NULL, strata = NULL, hierarchy = NULL)
  genind2df(x) # ok
#>               loc87_pos30 loc106_pos31 loc345_pos27
#> TT056001.trim        <NA>         <NA>           GT
#> TT060001.trim          AA           GG           GT
#> TT062001.trim          AA           GG           GG
#> TT063001.trim          AA           GG           GG
#> TT064001.trim          AA           GG           GT
  genind2genalex(x, tmp)
#> Extracting the table ...
#> Warning in FUN(X[[i]], ...): NAs introduced by coercion

#> Warning in FUN(X[[i]], ...): NAs introduced by coercion

#> Warning in FUN(X[[i]], ...): NAs introduced by coercion

#> Warning in FUN(X[[i]], ...): NAs introduced by coercion

#> Warning in FUN(X[[i]], ...): NAs introduced by coercion

#> Warning in FUN(X[[i]], ...): NAs introduced by coercion

#> Warning in FUN(X[[i]], ...): NAs introduced by coercion

#> Warning in FUN(X[[i]], ...): NAs introduced by coercion

#> Warning in FUN(X[[i]], ...): NAs introduced by coercion

#> Warning in FUN(X[[i]], ...): NAs introduced by coercion

#> Warning in FUN(X[[i]], ...): NAs introduced by coercion

#> Warning in FUN(X[[i]], ...): NAs introduced by coercion

#> Warning in FUN(X[[i]], ...): NAs introduced by coercion
#> Writing the table to /tmp/RtmpnNa39T/file106e133aa9690.csv ... Done.
#> [1] "/tmp/RtmpnNa39T/file106e133aa9690.csv"
  readLines(tmp)
#> [1] "3,5,1,5,,,,"                                                  
#> [2] ",,,Pop,,,,"                                                   
#> [3] "Ind,Pop,loc87_pos30, ,loc106_pos31, ,loc345_pos27, "          
#> [4] "\"TT056001.trim\",\"Pop\",\"0\",\"0\",\"0\",\"0\",\"0\",\"0\""
#> [5] "\"TT060001.trim\",\"Pop\",\"0\",\"0\",\"0\",\"0\",\"0\",\"0\""
#> [6] "\"TT062001.trim\",\"Pop\",\"0\",\"0\",\"0\",\"0\",\"0\",\"0\""
#> [7] "\"TT063001.trim\",\"Pop\",\"0\",\"0\",\"0\",\"0\",\"0\",\"0\""
#> [8] "\"TT064001.trim\",\"Pop\",\"0\",\"0\",\"0\",\"0\",\"0\",\"0\""

Created on 2021-01-25 by the reprex package (v0.3.0)

zkamvar commented 3 years ago

I have found the problem. poppr:::fill_zero() assumes that the incoming data is numeric. This procedure was bypassed in the fix for #108 by assuming that all SNP data was haploid.

The solution I'm going with is to modify the mat = FALSE flag of fill_zero -> fill_zero_locus -> generate_bruvo_mat that will accept non-numeric data. I am changing it to mat_type = character(0) by default and accepting one of three scenarios:

  1. character(0): should produce a data frame with one locus per column
  2. "numeric": produces a numeric matrix with one allele per column
  3. "character": produces a character matrix with one allele per column.