jgx65 / hierfstat

the hierfstat package
24 stars 14 forks source link

Problem while converting from genind to a hierfstat format #21

Closed frederic-michaud closed 4 years ago

frederic-michaud commented 6 years ago

The following script raises an error on the last line in my configuration:

The script:

library(adegenet)
library(hierfstat)
data(nancycats)
fstat.nancycats <- genind2hierfstat(nancycats)
basic.stats(fstat.nancycats) #works well
genet.dist(fstat.nancycats) #works well
eucl.dist(fstat.nancycats) #does not work

The error:

Error in Summary.factor(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,  : 
  'max' not meaningful for factors

The problem comes from the fact that in genind, the populations (genind_object$pop) are (or at least can be) factor , whereas the function eucl.dist expect the population (hierfstat_object[,1] )to be number. I'm not sure if the first column of a "hierfstat object" should be in general a number or a can be both (a factor or a number). In the first case, we should check it in the function genind2hierfstat. In the latter case, we should change the function eucl.dist. I guess this is the most appropriate choice. @jgx65 , tell me what solution is the best for you and I can submit a PR.

thibautjombart commented 6 years ago

My 2 cents is it makes more sense to store labelled group information as factors. I can see at least two issues for not doing so:

zkamvar commented 6 years ago

If eucl.dist calculates euclidean distance among populations, then shouldn't the following work as well?

library(adegenet)
#> Loading required package: ade4
#> 
#>    /// adegenet 2.1.1 is loaded ////////////
#> 
#>    > overview: '?adegenet'
#>    > tutorials/doc/questions: 'adegenetWeb()' 
#>    > bug reports/feature requests: adegenetIssues()
data(nancycats)
dist(tab(genind2genpop(nancycats), freq = TRUE))
#> 
#>  Converting data from a genind to a genpop object... 
#> 
#> ...done.
#>           P01       P02       P03       P04       P05       P06       P07
#> P02 1.5514481                                                            
#> P03 1.3477476 1.5547478                                                  
#> P04 1.2474840 1.3857293 0.8702066                                        
#> P05 1.1943588 1.4911924 1.1521719 1.0166375                              
#> P06 1.4193059 1.3869224 1.0019208 1.0330582 1.2519810                    
#> P07 1.2478150 1.3916849 1.4120350 1.2557246 1.4470529 1.2954705          
#> P08 1.2066224 1.3100691 1.2557645 1.0526140 1.3262311 1.2760347 1.1348352
#> P09 1.4809841 1.5149411 1.4732701 1.2693139 1.5171448 1.4619620 1.6537576
#> P10 1.1858182 1.2423319 1.4153291 1.3943889 1.3286281 1.3711912 1.4347499
#> P11 1.1519297 1.3182438 0.9762138 0.9030445 1.1213038 0.9996316 1.2991939
#> P12 1.2732672 1.4548571 1.3127612 1.1155339 1.2954694 1.3409445 1.2447582
#> P13 1.3950482 1.5077594 1.4740663 1.3781801 1.4248061 1.3714141 1.5379466
#> P14 1.3067279 1.4115382 1.1081984 1.0161166 1.2470943 1.2331804 1.4507732
#> P15 1.2348953 1.3267745 1.2157380 1.1133308 1.3618035 1.3483997 1.4440005
#> P16 1.5062716 1.4352412 1.4373490 1.2569905 1.3337499 1.1825464 1.5319979
#> P17 1.1657247 1.5450550 1.3662220 1.2525573 1.1750072 1.5170958 1.3375208
#>           P08       P09       P10       P11       P12       P13       P14
#> P02                                                                      
#> P03                                                                      
#> P04                                                                      
#> P05                                                                      
#> P06                                                                      
#> P07                                                                      
#> P08                                                                      
#> P09 1.5516618                                                            
#> P10 1.2067989 1.4487116                                                  
#> P11 1.1889893 1.2156551 1.1056855                                        
#> P12 1.3222543 1.3295918 1.4715996 1.2795903                              
#> P13 1.4603842 1.6733048 1.5334709 1.4159866 1.4670096                    
#> P14 1.1799727 1.3132643 1.3705882 1.0757203 1.2901729 1.4147138          
#> P15 1.1402479 1.2847754 1.1372724 1.1771063 1.4107731 1.4157774 1.1756729
#> P16 1.4871673 1.3749299 1.4538351 1.1973162 1.4820175 1.5114618 1.3370960
#> P17 1.4895873 1.4590764 1.3906382 1.1759670 1.1140455 1.4843346 1.3246809
#>           P15       P16
#> P02                    
#> P03                    
#> P04                    
#> P05                    
#> P06                    
#> P07                    
#> P08                    
#> P09                    
#> P10                    
#> P11                    
#> P12                    
#> P13                    
#> P14                    
#> P15                    
#> P16 1.3917555          
#> P17 1.3475301 1.6113889
frederic-michaud commented 6 years ago

Thanks for your replies, and yes I agree that roundtrip conversion should work. A workaround for my script is obviously to change the first column:

fstat.nancycats <- genind2hierfstat(nancycats) 
fstat.nancycats[, 1] <- as.integer(fstat.nancycats[, 1] )
eucl.dist(fstat.nancycats) # work now

This gives a different answer to what @zkamvar propose. If I'm not mistaken, the reason is that hierfstat compute the distance between the "genotype" whereas @zkamvar gives the distance for the alleles frequency. Consider for example the two following populations with two individuals:

pop1: AA & aa pop2: Aa & Aa

For @zkamvar the distance is 0 (right?) whereas for hierfstat, it's not. Actually, the hierfstat version looks very strange to me because (from what I see) Aa and aA individuals are considered as being different.

jgx65 commented 6 years ago

Sorry to jump in late, email issue...

euclid.dist has not been properly checked, I should remove it or/and include it as a method in genet.dist