jgx65 / hierfstat

the hierfstat package
24 stars 14 forks source link

modified genind2hierfstat to include conversion pathway for loci when… #15

Closed EricArcher closed 8 years ago

EricArcher commented 8 years ago

… alleles are present that are neither nucleotides nor all numeric

I've tested this against examples of all 3 possible types and it seems to work now.

jgx65 commented 8 years ago

Thanks a lot Eric. This seems to work in most cases. One situation when it does not is when the number of allelic states differ between loci, i.e., some have less than 10 and are thus encoded with one digit and the others have more than 10, hence encoded as two digits. I'm looking into how to fix it

thierrygosselin commented 8 years ago

In my case I fixed that using

stringi::stri_pad_left(str = ALLELE, width = 3, pad = 0)

where ALLELE is my column of all the alleles

if that can help Thierry

EricArcher commented 8 years ago

Howabout moving the digits assignment higher and calculating it like this (new lines are max.length and digits assignments outside of the do.call loop with the previous digits assignment inside the loop deleted):

    if(length(grep("[[:alpha:]|[:punct:]]", alleles.name)) > 0) {
      # for loci where alleles are not all numeric nor all nucleotides
      dat <- as.matrix(genind2df(dat, sep = "", usepop = FALSE,
oneColPerAll = TRUE))
      dat <- apply(dat, 2, function(a) ifelse(a == "NA", NA, a))
      max.length <- max(sapply(alleles(dat), length))
     digits <- floor(log10(max.length))  
      # cycle through each locus
      do.call(cbind, lapply(seq(ploid, ncol(dat), by = ploid),
function(end) {
        start <- end - ploid + 1
        alleles <- sort(unique(as.vector(dat[, start:end])))
        # match alleles of each genotype with sorted order: index is new
allele
        apply(dat[, start:end, drop = FALSE], 1, function(gntp) {
          if(any(is.na(gntp))) return(NA)
          gntp <- match(gntp, alleles)
          # zero pad numeric alleles and return collapsed genotype
          gntp <- formatC(gntp, digits = digits, flag = "0", mode =
"integer")
          as.integer(paste(gntp, collapse = ""))
        })
      }))
    }

Eric Archer, Ph.D. Southwest Fisheries Science Center NMFS, NOAA 8901 La Jolla Shores Drive La Jolla, CA 92037 USA 858-546-7121 (work) 858-546-7003 (FAX)

Marine Mammal Genetics Group: swfsc.noaa.gov/mmtd-mmgenetics ETP Cetacean Assessment Program: swfsc.noaa.gov/mmtd-etp

"

The universe doesn't care what you believe. The wonderful thing about science is that it doesn't ask for your faith, it just asks for your eyes." - Randall Munroe

"Lighthouses are more helpful than churches."

On Tue, Mar 1, 2016 at 9:21 AM, Jerome Goudet notifications@github.com wrote:

Thanks a lot Eric. This seems to work in most cases. One situation when it does not is when the number of allelic states differ between loci, i.e., some have less than 10 and are thus encoded with one digit and the others have more than 10, hence encoded as two digits. I'm looking into how to fix it

— Reply to this email directly or view it on GitHub https://github.com/jgx65/hierfstat/pull/15#issuecomment-190821190.

jgx65 commented 8 years ago

Yeap, it works, great! (moved the line max.lenght... before the call to genind2df