jgx65 / hierfstat

the hierfstat package
24 stars 14 forks source link

getal seems to require ordered population labels #42

Closed falseflattortuga closed 3 years ago

falseflattortuga commented 3 years ago

Here is a simple example that illustrates the issue:

library(adegenet) library(hierfstat)

create genind object

df <- data.frame(locusA=c("11","11","12","12","22","12","21"), locusB=c(NA,"21","12","11","11","21","12"),locusC=c("22","22","21","22","11","22","12")) obj <- df2genind(df, ploidy=2, ncode=1) pop(obj) <- c(2,2,1,1,1,1,3) #assign individuals to populations obj #examine genind object /// GENIND OBJECT /////////

// 7 individuals; 3 loci; 6 alleles; size: 6.8 Kb

// Basic content @tab: 7 x 6 matrix of allele counts @loc.n.all: number of alleles per locus (range: 2-2) @loc.fac: locus factor for the 6 columns of @tab @all.names: list of allele names for each locus @ploidy: ploidy of each individual (range: 2-2) @type: codom @call: df2genind(X = df, ncode = 1, ploidy = 2)

// Optional content @pop: population of each individual (group size range: 1-4)

create hierfstat data frame

objh <- genind2hierfstat(obj) objh #examine hierfstat data frame pop locusA locusB locusC 1 1 11 NA 22 2 1 11 21 22 3 1 12 21 21 4 1 12 11 22 5 2 22 11 11 6 2 12 21 22 7 3 12 21 21

example 1

getal(objh) #produces error message Error in data.frame(pop = rep(data[, 1], 2), ind = ind, al = rbind(firstal, : arguments imply differing number of rows: 14, 10

example 2, with a different assignment of individuals to populations

objh[,1] <- c(1,2,2,1,1,1,3) getal(objh) # no error message, but individuals are incorrectly assigned to populations pop ind locusA locusB locusC 1 1 1 1 NA 2 2 2 2 1 2 2 3 2 3 1 2 2 4 1 4 1 1 2 5 1 1 2 1 1 6 1 2 1 2 2 7 3 1 1 2 2 11 1 1 1 NA 2 21 2 2 1 1 2 31 2 3 2 1 1 41 1 4 2 1 2 51 1 1 2 1 1 61 1 2 2 1 2 71 3 1 2 1 1

example 3, ordered population labels seem required to get correct results

objh[,1] <- c(1,1,1,1,2,2,3) getal(objh) pop ind locusA locusB locusC 1 1 1 1 NA 2 2 1 2 1 2 2 3 1 3 1 2 2 4 1 4 1 1 2 5 2 1 2 1 1 6 2 2 1 2 2 7 3 1 1 2 2 11 1 1 1 NA 2 21 1 2 1 1 2 31 1 3 2 1 1 41 1 4 2 1 2 51 2 1 2 1 1 61 2 2 2 1 2 71 3 1 2 1 1

looking at the code in the getal function, the issue seems to be in the following for loop (which, by the way, falls into the second circle of the R Inferno: growing objects):

for (i in sort(unique(data[, 1]))) { dum <- 1:sum(data[, 1] == i) if (i == 1) ind <- dum else ind <- c(ind, dum) }

if ordered population labels are assumed, then the loop above may be replaced by:

ind <- sequence(table(data[, 1]))

jgx65 commented 3 years ago

Thanks for raising this issue. getal should not be accessible directly, it is an internal function called by others, where the sorting has be done. I'll fix this.

Cheers

jgx65 commented 3 years ago

Issue fixed I believe:

library(adegenet)
library(hierfstat)

#create genind object
df <- data.frame(locusA=c("11","11","12","12","22","12","21"),
                 locusB=c(NA,"21","12","11","11","21","12"),locusC=c("22","22","21","22","11","22","12"))
obj <- df2genind(df, ploidy=2, ncode=1)
pop(obj) <- c(2,2,1,1,1,1,3) #assign individuals to populations
objh <- genind2hierfstat(obj)
> getal(objh)
   pop ind locusA locusB locusC
1    2   1      1     NA      2
2    2   2      1      2      2
3    1   1      1      2      2
4    1   2      1      1      2
5    1   3      2      1      1
6    1   4      1      2      2
7    3   1      1      2      2
11   2   1      1     NA      2
21   2   2      1      1      2
31   1   1      2      1      1
41   1   2      2      1      2
51   1   3      2      1      1
61   1   4      2      1      2
71   3   1      2      1      1

see https://github.com/jgx65/hierfstat/commit/f35069782833e4febd0466c5195f3f8107d8d8c8#diff-c68e163ba4719d8defe6ef01d9a88ee65f10532acf550009d1bcd81e4c823983

(note that you don't need to convert the genind object to ahierfstat one, getal(obj) works the same)

jgx65 commented 3 years ago

@falseflattortuga is the issue solved and can I close it? many thanks