jgx65 / hierfstat

the hierfstat package
24 stars 14 forks source link

Revert "modifications to genind2hierfstat" #14

Closed jgx65 closed 8 years ago

jgx65 commented 8 years ago

Reverts jgx65/hierfstat#13 The modified function does not work:

library(hierfstat)
library(adegenet)
head(dat<-sim.genot())
dat1<-data.frame(dat[,1],t(apply(dat[,-1],1,function(x) paste(x %/% 10, x %% 10, sep="_"))))
head(dat1)
x<-df2genind(dat1[,-1],sep="_",pop=dat1[,1])
head(genind2df(x))
head(genind2hierfstat(x))
jgx65 commented 8 years ago

@EricArcher Not sure what you were trying to fix in genind2hierfstat. You mentioned genotypes encoded as eg 1_11. do you mean that _ is a separator for the two alleles? If you run the example above, you'll see that genind2hierfstat new version does not give the expected result (i.e., same as head(dat1) )

EricArcher commented 8 years ago

Hmmm... Sorry about that. The problem I was having was that I was getting a matrix of NAs for some datasets. I traced it to the following lines in the previous version of genind2hierfstat:

  x<-genind2df(dat,sep="",usepop=FALSE)
  #to catch alleles encoded with letters, e.g. H3N2
  if (length(grep("[A-Z]",alleles.name))==0) x<-as.matrix(data.frame(lapply(x,as.integer)))

genind2df was regenerating a data.frame with my original allele names (which were like 1_11). It didn't matter that they had a _ as a separator, it was that they did not have any letters, but they were not numeric, so the as.integer conversion was converting everything to NA.

I'm trying to duplicate my original error, and then if I can, will fix my fix to produce the correct output (I misunderstood the output format). Back soon...

jgx65 commented 8 years ago

I see. Basically, you'd like any character string to be a valid allele name, correct? I was trying to avoid using factor in the initial version because it could sometimes slow down things quite drastically. We could keep the classic integers, the nucleotides, and then something for anything else?

EricArcher commented 8 years ago

I should've known that you'd already gone the factor route and hadn't chosen it for the sake of efficiency. I like your suggestion. I'll start with the previous version of the code again and add that in.