jgx65 / hierfstat

the hierfstat package
24 stars 14 forks source link

`getal.b` should determine `modulo` per column and not once for the whole dataset #33

Open timflutre opened 4 years ago

timflutre commented 4 years ago

The getal.b function looks at the second column of its input data.frame only to determine the encoding of the alleles. This fails when, for instance, the 2nd marker encodes alleles with modulo 100 whereas the first encodes alleles with modulo 1000.

Reproducible example:

> tmp <- data.frame(pop=factor(c(1,1,2,2)),
                                 mrk1=c(150150,150142,134134,150134),
                                 mrk2=c(8882,8882,8880,8882))
> tmp
  pop   mrk1 mrk2
1   1 150150 8882
2   1 150142 8882
3   2 134134 8880
4   2 150134 8882
> getal.b(tmp[,-1])
, , 1

     [,1] [,2]
[1,] 1501   88
[2,] 1501   88
[3,] 1341   88
[4,] 1501   88

, , 2

     [,1] [,2]
[1,]   50   82
[2,]   42   82
[3,]   34   80
[4,]   34   82

I can propose a pull request solving this bug by adding a for loop so that modulo is determined per column. Are you interested?

p.s. : I found this bug because of this error Error in 2 * nal * p - mho : non-conformable arrays from function wc

timflutre commented 4 years ago

I made pull request #34