jgx65 / hierfstat

the hierfstat package
24 stars 14 forks source link

boot.ppfst #18

Closed juance100 closed 6 years ago

juance100 commented 7 years ago

Hi, I have noticed that boot.ppfst fails and gives an error message if populations are not sorted. The issue is easy to solve by sorting the dataframe according to the pop column. Cheers Juan

millerad commented 7 years ago

I am analysing large SNP datasets and getting the error "Error in i + 1 : non-numeric argument to binary operator” when trying to calculate confidence intervals for Fst (boot.ppfst(data,nboot=100,quant=c(0.025,0.975),diploid=TRUE).

Juance100 suggests (above) that resorting the data frame according to the population column is an easy solution. How do I sort the dataframe in hierstat once I have created a Genind object from my input file (probably very simple but this is new to me).

Jérôme Goudet suggests this error might have to do with pairs of samples not typed at any loci in one of the bootstrap samples. Given there is quite a bit of missing data in SNP datasets is there anyway around this? Juance100 are you dealing with SNP datasets and having success?

It seems strange to need to resort the data frame for this particular calculation when I can calculate CIs for Fis and generate other stats without a problem.

Cheers, Adam

ilariaco commented 7 years ago

I am encountering the same problem as millerad. Has anybody managed to fix this?

jgx65 commented 7 years ago

I am unable to reproduce the problem you are mentioning:

> data(gtrunchier)
> boot.ppfst(gtrunchier[,-2])$ul
   1         2         3         4         5         6
1 NA 0.5798797 0.6241770 0.5050520 0.3696103 0.4703662
2 NA        NA 0.8954746 0.8363857 0.7377449 0.8322778
3 NA        NA        NA 0.8145870 0.7266791 0.7068185
4 NA        NA        NA        NA 0.6760324 0.5726174
5 NA        NA        NA        NA        NA 0.6187141
6 NA        NA        NA        NA        NA        NA
> x<-sample(dim(gtrunchier)[1])  
> boot.ppfst(gtrunchier[x,-2])$ul
   3         6         1         2         5         4
3 NA 0.5686564 0.5933609 0.5038022 0.3758957 0.4664920
6 NA        NA 0.8884007 0.8294153 0.7104784 0.8302167
1 NA        NA        NA 0.7839115 0.7011805 0.6965105
2 NA        NA        NA        NA 0.6396456 0.5673923
5 NA        NA        NA        NA        NA 0.6238607
4 NA        NA        NA        NA        NA        NA
jgx65 commented 7 years ago

actually, the population identifier does not print properly. fixed now

@juance100 , @millerad , @ilariaco does this fix the problem? @millerad to create a hierfstat dataframe from a genind object, use genind2hierfstat

> data(gtrunchier)
> pop<-gtrunchier[,1]
> for (i in 1:6) pop[pop==i]<-LETTERS[i]
> pop<-factor(pop)
> x<-sample(dim(gtrunchier)[1])
> boot.ppfst(data.frame(pop,gtrunchier[,-c(1:2)]))

       Upper limit above diagonal 
       Lower limit below diagonal 

          A         B         C         D         E         F
A        NA 0.5799332 0.6159206 0.5129958 0.3696103 0.4596818
B 0.2958801        NA 0.9066774 0.8430523 0.6847790 0.8278949
C 0.2563006 0.6539571        NA 0.8300546 0.7185711 0.7386531
D 0.3966815 0.6446089 0.4538589        NA 0.6665882 0.5594420
E 0.2783753 0.3262660 0.4335564 0.3543698        NA 0.6212815
F 0.3233821 0.6260546 0.3486324 0.2712971 0.3710020        NA
> boot.ppfst(data.frame(pop[x],gtrunchier[x,-c(1:2)]))

       Upper limit above diagonal 
       Lower limit below diagonal 

          A         B         C         D         E         F
A        NA 0.5752332 0.5956620 0.5083705 0.3699870 0.4694581
B 0.2967589        NA 0.8845356 0.8439365 0.7077343 0.8278918
C 0.2649480 0.6698664        NA 0.8047852 0.7214798 0.6916809
D 0.3815558 0.6494387 0.4603131        NA 0.6558580 0.5544860
E 0.2845807 0.3262660 0.4365065 0.3707435        NA 0.6329700
F 0.3079071 0.6256595 0.3437165 0.2772144 0.3890714        NA
xbouteiller commented 7 years ago

Hi, I've got the same problem, it seems that I need to convert my "pop" population vector which is a factor into numeric (is.numeric(pop)) then to sort it (if I don't sort it, I've got the error message: "Error in rep(1:nl, alploc) : argument 'times' incorrect")

The example above doesnt' work on my computer. However, the two following command lines are working:

1/
data(gtrunchier)
boot.ppfst(gtrunchier)
2/
 pop<-gtrunchier[,1]
 for (i in 1:6) pop[pop==i]<-LETTERS[i]
 pop<-factor(pop)
 boot.ppfst(data.frame(as.numeric(pop),gtrunchier[,-c(1:2)]))

That is quite similar to the previous code ...

jgx65 commented 6 years ago

Hi @xbouteiller could you post or send me an example that does not work, together with sessionInfo() ? Thanks

xbouteiller commented 6 years ago

Hi

Thank you for your answer,

> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=French_France.1252  LC_CTYPE=French_France.1252   
[3] LC_MONETARY=French_France.1252 LC_NUMERIC=C                  
[5] LC_TIME=French_France.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] hierfstat_0.04-22 adegenet_2.0.1    ade4_1.7-4       

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.6      spdep_0.6-5      plyr_1.8.4       LearnBayes_2.15  tools_3.3.1     
 [6] boot_1.3-18      digest_0.6.9     tibble_1.1       nlme_3.1-128     gtable_0.2.0    
[11] lattice_0.20-33  mgcv_1.8-12      Matrix_1.2-6     igraph_1.0.1     shiny_0.13.2    
[16] DBI_0.4-1        parallel_3.3.1   coda_0.18-1      cluster_2.0.5    dplyr_0.5.0     
[21] stringr_1.0.0    gtools_3.5.0     grid_3.3.1       R6_2.1.2         sp_1.2-3        
[26] gdata_2.17.0     ggplot2_2.2.1    reshape2_1.4.1   seqinr_3.3-0     deldir_0.1-12   
[31] magrittr_1.5     scales_0.4.1     htmltools_0.3.5  MASS_7.3-45      splines_3.3.1   
[36] gmodels_2.16.2   assertthat_0.1   permute_0.9-0    mime_0.5         ape_3.5         
[41] colorspace_1.2-6 xtable_1.8-2     httpuv_1.3.3     stringi_1.1.1    lazyeval_0.2.0  
[46] munsell_0.4.3    vegan_2.4-0

The following code is not working, I've got the same error message with my SNPs own data (and it is the same than described by Millerad previously):

data(gtrunchier)
>  pop<-gtrunchier[,1]
>  for (i in 1:6) pop[pop==i]<-LETTERS[i]
>  pop<-factor(pop)
>  x<-sample(dim(gtrunchier)[1])
>   boot.ppfst(data.frame(pop,gtrunchier[,-c(1:2)]))
Error in i + 1 : argument non numérique pour un opérateur binaire
> 

If the population are not sorted, I obtain another error message:

>     boot.ppfst(data.frame(sample(as.numeric(pop)),gtrunchier[,-c(1:2)]))
Error in rep(1:nl, alploc) : argument 'times' incorrect

However, if the populations are sorted and in numeric form, the function runs

>    boot.ppfst(data.frame(sort(as.numeric(pop)),gtrunchier[,-c(1:2)]))
$call
boot.ppfst(dat = data.frame(sort(as.numeric(pop)), gtrunchier[, 
    -c(1:2)]))

$ll
     [,1]   [,2]   [,3]   [,4]   [,5]   [,6]
[1,]   NA 0.3245 0.2658 0.3973 0.2948 0.3234
[2,]   NA     NA 0.6118 0.6446 0.4027 0.5598
[3,]   NA     NA     NA 0.4343 0.4327 0.3405
[4,]   NA     NA     NA     NA 0.3503 0.2811
[5,]   NA     NA     NA     NA     NA 0.3936
[6,]   NA     NA     NA     NA     NA     NA
jgx65 commented 6 years ago

Your version of hierfstat is an old version. Current version is 0.04-28. re-install it from github

library(devtools)
install_github("jgx65/hierfstat")
library("hierfstat")

, and try again, things should work

jgx65 commented 6 years ago

@xbouteiller , @ilariaco , is the issue solved with the updated hierfstat-0.04-28 and can I close this issue? many thanks

xbouteiller commented 6 years ago

Hi Sorry for the late answer ! The problem is fixed, I'm so sorry for this question, I tried to reinstall hierfstat several times from R directly but not from github Thank you for your help Xavier


Xavier Bouteiller

PhD Student at the University of BordeauxINRA research unit Team Functional Ecology and Genomics Team Genetics and Ecology of Populations

INRA - UMR BIOGECO 69, Route d'Arcachon 33612 Cestas Cedex France

xavier.bouteiller@u-bordeaux.fr bouteiller.xavier@gmail.com Phone: +33 (0)5 57 12 27 71 Webpage on BIOGECO https://www6.bordeaux-aquitaine.inra.fr/biogeco_eng/Staff/Staff-directory/A-C/Bouteiller-Xavier


2017-09-11 16:41 GMT+02:00 Jerome Goudet notifications@github.com:

@xbouteiller https://github.com/xbouteiller , @ilariaco https://github.com/ilariaco , is the issue solved with the updated hierfstat-0.04-28 and can I close this issue? many thanks

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jgx65/hierfstat/issues/18#issuecomment-328551014, or mute the thread https://github.com/notifications/unsubscribe-auth/Ac08W1Fb_NRcc2pC55qvOe7vhOLo8x1_ks5shUaigaJpZM4KnUYj .

millerad commented 6 years ago

Many thanks for your help, and sorry for the delay in getting back to you. I have installed the new version from github this seems to have solved this particular issue (now getting nei’s ppfst with CIs)

However, I cannot generate Weir and Cockerhams Fst (pairwise.WCfst). I keep getting an error : Error in names(y) <- nm[!nas] : 'names' attribute [11] must be the same length as the vector [1] In addition: Warning message: n is.na(x) : is.na() applied to non-(list or vector) of type 'S4')

I have simply changed the command from pairwise.fst(dat) to pairwise.WCfst(dat) according to the manual Any ideas? Kind regards, Adam

Dr Adam Miller Senior Lecturer Centre for Integrative Ecology School of Life and Environmental Sciences, Faculty of Science Engineering & Built Environment

[id:image001.png@01D31CFC.C2C0FC00] Deakin University PO BOX 423, Warrnambool, VIC 3280 Office: +61 3 5563 3171 Mobile: +61 488 735 482 a.millermailto:a.miller@deakin.edu.au@deakin.edu.aumailto:a.miller@deakin.edu.au www.deakin.edu.auhttp://www.deakin.edu.au/ Deakin University CRICOS Provider Code 00113B https://scholar.google.com.au/citations?user=zLtS2zUAAAAJ&hl=en http://www.deakin.edu.au/profiles/adam-miller

Important Notice: The contents of this email are intended solely for the named addressee and are confidential; any unauthorised use, reproduction or storage of the contents is expressly prohibited. If you have received this email in error, please delete it and any attachments immediately and advise the sender by return email or telephone.

Deakin University does not warrant that this email and any attachments are error or virus free.

From: Jerome Goudet notifications@github.com Reply-To: jgx65/hierfstat reply@reply.github.com Date: Tuesday, 12 September 2017 at 12:41 am To: jgx65/hierfstat hierfstat@noreply.github.com Cc: Adam Miller a.miller@deakin.edu.au, Mention mention@noreply.github.com Subject: Re: [jgx65/hierfstat] boot.ppfst (#18)

@xbouteillerhttps://github.com/xbouteiller , @ilariacohttps://github.com/ilariaco , is the issue solved with the updated hierfstat-0.04-28 and can I close this issue? many thanks

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/jgx65/hierfstat/issues/18#issuecomment-328551014, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AcaExsGv8qd9LGPgLNCVPEqBMuexbva0ks5shUaigaJpZM4KnUYj.

Important Notice: The contents of this email are intended solely for the named addressee and are confidential; any unauthorised use, reproduction or storage of the contents is expressly prohibited. If you have received this email in error, please delete it and any attachments immediately and advise the sender by return email or telephone.

Deakin University does not warrant that this email and any attachments are error or virus free.

jgx65 commented 6 years ago

@millerad Hmm, this is puzzling. I tried the following code, which worked:

> pop<-rep(letters[1:6],table(gtrunchier[,1]))
> pairwise.WCfst(data.frame(pop,gtrunchier[,-c(1:2)]))
          a         b         c         d         e         f
a        NA 0.4695425 0.4194788 0.4486061 0.3356973 0.3992087
b 0.4695425        NA 0.7795719 0.7648904 0.5994008 0.7563057
c 0.4194788 0.7795719        NA 0.6321912 0.5688411 0.5308519
d 0.4486061 0.7648904 0.6321912        NA 0.5199231 0.4776794
e 0.3356973 0.5994008 0.5688411 0.5199231        NA 0.5065197
f 0.3992087 0.7563057 0.5308519 0.4776794 0.5065197        NA
> x<-sample(dim(gtrunchier)[1])
> pairwise.WCfst(data.frame(as.character(pop[x]),gtrunchier[x,-c(1:2)]))
          a         b         c         d         e         f
a        NA 0.4695425 0.4194788 0.4486061 0.3356973 0.3992087
b 0.4695425        NA 0.7795719 0.7648904 0.5994008 0.7563057
c 0.4194788 0.7795719        NA 0.6321912 0.5688411 0.5308519
d 0.4486061 0.7648904 0.6321912        NA 0.5199231 0.4776794
e 0.3356973 0.5994008 0.5688411 0.5199231        NA 0.5065197
f 0.3992087 0.7563057 0.5308519 0.4776794 0.5065197        NA

Could you post a toy example reproducing the error? thanks

jgx65 commented 6 years ago

sorry to pester you @millerad , is this issue solved and can I close it? Many thanks

jgx65 commented 6 years ago

Since there has been no replies for 2 months, I consider the issue closed. Re-open if necessary