thibautjombart / adegenet

adegenet: a R package for the multivariate analysis of genetic markers
166 stars 64 forks source link

unsettling discrepancy in allele identity between original datafile and converted genind object #256

Closed CarolineJudy closed 5 years ago

CarolineJudy commented 5 years ago

Hi Thibaut, I have a discrepancy between my original datafile (genepop format) and my genind object created using the Adegenet function "read.genepop" -- the alleles are not the same after the conversion.

Original genepop datafile read in by Adegenet [subset for first three individuals, and first three loci]

001001  001001  002002
001001  001001  002002
002001  002001  002002

Adegenet created Genind object

my_genind@tab[1:3,1:6]
     LTP1.002 LTP1.001 LTP2.001 LTP2.002 LTP3.002 LTP3.001
ind1         1         1         2         0         2         0
ind2         0         2        NA        NA         2         0
ind3         0         2        NA        NA        NA        NA

I'm attaching the R console output for your review. I didn't see any notes in bugs reported for the "read.genepop" function.

Many thanks for your assistance, Caroline

R Console.txt

zkamvar commented 5 years ago

Hi @CarolineJudy,

I agree that this is unsettling. To help with this we'll need a couple of things from you:

  1. Can you construct a minimal genepop file from your data (e.g. 2 populations of three and three loci)?
  2. Do you get this error when you use the github version of adegenet?

A workaround for this may be to import your genepop data into GenAlEx (excel macro) and then read it into R with the read.genalex() function from poppr.

CarolineJudy commented 5 years ago

Thanks for your help. I've attached a subset datafile ("genepop.subset.test.gen") below. I performed the conversion using a newer build of R (3.6.0) and the github version of Adegenet (2.1.2). The problem persists.

See attached mini datafile, R script for testing, and R Console output.

I first noticed the problem when trying to use the utility "genind2db". That outputted an unusable format for downstream input into other programs (structure). The number of columns (loci) were different across individuals, where there should the same number -- each column having an entry corresponding to a locus-specific bilallelic code, or an NA if no allelic code is present for a given locus. I mentioned that here because I'm not sure if the problem is with both utilities "read.genepop" and "genind2db" or just "read.genepop", which creates a faulty genind and thus, downstream problems with export using "genind2db".

Thanks again, Caroline

On Tue, Apr 30, 2019 at 5:46 AM Zhian N. Kamvar notifications@github.com wrote:

Hi @CarolineJudy https://github.com/CarolineJudy,

I agree that this is unsettling. To help with this we'll need a couple of things from you:

  1. Can you construct a minimal genepop file from your data (e.g. 2 populations of three and three loci)?
  2. Do you get this error when you use the github version of adegenet?

A workaround for this may be to import your genepop data into GenAlEx (excel macro) and then read it into R with the read.genalex() function from poppr https://grunwaldlab.github.io/poppr/reference/read.genalex.html.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/thibautjombart/adegenet/issues/256#issuecomment-487891162, or mute the thread https://github.com/notifications/unsubscribe-auth/AB6YEZ4Z5BLW2HUACE24NJTPTAIQFANCNFSM4HIZVY7A .

R commands and output ```r R version 3.6.0 (2019-04-26) -- "Planting of a Tree" Copyright (C) 2019 The R Foundation for Statistical Computing Platform: x86_64-apple-darwin15.6.0 (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. [R.app GUI 1.70 (7657) x86_64-apple-darwin15.6.0] [Workspace restored from /Users/carolineduffie/.RData] [History restored from /Users/carolineduffie/.Rapp.history] > #-----------------------------------------------# > # Adegenet TEST # > # JUDY C / 30 April 2019 # > #-----------------------------------------------# > > # Test to see if allele identify problem persists when 1) running under the new build of R, and 2) using a smaller input dataset of 3 individuals, 3 loci and 2 pops, and using the github version of adegenet. > > # ENVIRONMENT SETUP > rm(list = ls(all = TRUE)) > library(adegenet) Loading required package: ade4 Registered S3 methods overwritten by 'ggplot2': method from [.quosures rlang c.quosures rlang print.quosures rlang Registered S3 method overwritten by 'spdep': method from plot.mst ape /// adegenet 2.1.1 is loaded //////////// > overview: '?adegenet' > tutorials/doc/questions: 'adegenetWeb()' > bug reports/feature requests: adegenetIssues() > # TEST 1: NEW BUILD OF R v. 3.6 > gdata = read.genepop("~/Documents/Trochilus/Chapter_2/PCA/no_outliers/pruning/Trochilus_Genepop_newcopy.gen", ncode=3) #perform conversion using Adegenet 2.1.1 Converting data from a Genepop .gen file to a genind object... File description: DATAFILE ...done. Warning message: In df2genind(X = X, pop = as.character(pop), ploidy = 2, ncode = ncode, : entirely non-type individual(s) deleted > gdata@tab[1:3,1:6] #check data, problem persists LTP39.002 LTP39.001 LTP42.001 LTP42.002 LTP74.002 LTP74.001 3914 1 1 2 0 2 0 3915 0 2 NA NA 2 0 3916 0 2 NA NA NA NA > > # TEST 2: USE SUBSET INPUT FILE > gdata = read.genepop("~/Desktop/Analysis/genepop.subset.test.gen", ncode=3) #perform conversion of subset input file using Adegenet 2.1.1 Converting data from a Genepop .gen file to a genind object... File description: Datafile ...done. > mydata <- read.table("~/Desktop/Analysis/genepop.subset.test.gen", sep="\t", quote="", stringsAsFactors=FALSE) #check against original datafile > mydata V1 1 Datafile 2 LTP39 3 LTP42 4 LTP74 5 Pop 6 Pop1_4117 , 002001 002001 002002 7 Pop1_4118 , 001001 002001 002002 8 Pop1_4119 , 002001 002001 002002 9 Pop 10 Pop2_4007 , 001001 002001 002002 11 Pop2_4008 , 002001 002001 000000 12 Pop2_4009 , 002001 000000 002001 > gdata /// GENIND OBJECT ///////// // 6 individuals; 3 loci; 6 alleles; size: 6.6 Kb // Basic content @tab: 6 x 6 matrix of allele counts @loc.n.all: number of alleles per locus (range: 2-2) @loc.fac: locus factor for the 6 columns of @tab @all.names: list of allele names for each locus @ploidy: ploidy of each individual (range: 2-2) @type: codom @call: read.genepop(file = "~/Desktop/Analysis/genepop.subset.test.gen", ncode = 3) // Optional content @pop: population of each individual (group size range: 3-3) > library(devtools) > install_github("thibautjombart/adegenet") Downloading GitHub repo thibautjombart/adegenet@master ─ cleaning srcCRIPTION meta-information ...g/55slp7vn2kd6q5l91nf0c6hr0000gn/T/RtmpYZrOjl/remotes2b155227187c/thibautjombart-adegenet-838cc49/DESCRIPTION’ ... ─ building ‘adegenet_2.1.2.tar.gz’ building ‘adegenet_2.1.2.tar.gz’ * installing *source* package ‘adegenet’ ... ** using staged installation ** libs clang -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk -I/usr/local/include -fPIC -Wall -g -O2 -c GLfunctions.c -o GLfunctions.o clang -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk -I/usr/local/include -fPIC -Wall -g -O2 -c adesub.c -o adesub.o adesub.c:479:15: warning: unused variable 'seed' [-Wunused-variable] int i, n, seed; ^ 1 warning generated. clang -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk -I/usr/local/include -fPIC -Wall -g -O2 -c init.c -o init.o clang -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk -I/usr/local/include -fPIC -Wall -g -O2 -c monmonier-utils.c -o monmonier-utils.o clang -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk -I/usr/local/include -fPIC -Wall -g -O2 -c sharedAll.c -o sharedAll.o clang -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk -I/usr/local/include -fPIC -Wall -g -O2 -c snpbin.c -o snpbin.o clang -dynamiclib -Wl,-headerpad_max_install_names -undefined dynamic_lookup -single_module -multiply_defined suppress -L/Library/Frameworks/R.framework/Resources/lib -L/usr/local/lib -o adegenet.so GLfunctions.o adesub.o init.o monmonier-utils.o sharedAll.o snpbin.o -F/Library/Frameworks/R.framework/.. -framework R -Wl,-framework -Wl,CoreFoundation ld: warning: text-based stub file /System/Library/Frameworks//CoreFoundation.framework/CoreFoundation.tbd and library file /System/Library/Frameworks//CoreFoundation.framework/CoreFoundation are out of sync. Falling back to library file for linking. installing to /Library/Frameworks/R.framework/Versions/3.6/Resources/library/00LOCK-adegenet/00new/adegenet/libs ** R ** data ** inst ** byte-compile and prepare package for lazy loading ** help *** installing help indices ** building package indices ** testing if installed package can be loaded from temporary location ** checking absolute paths in shared objects and dynamic libraries ** testing if installed package can be loaded from final location ** testing if installed package keeps a record of temporary installation path * DONE (adegenet) > library("adegenet") > ?adegenet starting httpd help server ... done > gdata_sub = read.genepop("~/Desktop/Analysis/genepop.subset.test.gen", ncode=3) #perform conversion of subset input file using Adegenet 2.1.2 Converting data from a Genepop .gen file to a genind object... File description: Datafile ...done. > gdata_sub /// GENIND OBJECT ///////// // 6 individuals; 3 loci; 6 alleles; size: 6.6 Kb // Basic content @tab: 6 x 6 matrix of allele counts @loc.n.all: number of alleles per locus (range: 2-2) @loc.fac: locus factor for the 6 columns of @tab @all.names: list of allele names for each locus @ploidy: ploidy of each individual (range: 2-2) @type: codom @call: read.genepop(file = "~/Desktop/Analysis/genepop.subset.test.gen", ncode = 3) // Optional content @pop: population of each individual (group size range: 3-3) > gdata_sub@tab LTP39.002 LTP39.001 LTP42.002 LTP42.001 LTP74.002 LTP74.001 Pop1_4117 1 1 1 1 2 0 Pop1_4118 0 2 1 1 2 0 Pop1_4119 1 1 1 1 2 0 Pop2_4007 0 2 1 1 2 0 Pop2_4008 1 1 1 1 NA NA Pop2_4009 1 1 NA NA 1 1 > mydata <- read.table("~/Desktop/Analysis/genepop.subset.test.gen", sep="\t", quote="", stringsAsFactors=FALSE) #check against original datafile > mydata V1 1 Datafile 2 LTP39 3 LTP42 4 LTP74 5 Pop 6 Pop1_4117 , 002001 002001 002002 7 Pop1_4118 , 001001 002001 002002 8 Pop1_4119 , 002001 002001 002002 9 Pop 10 Pop2_4007 , 001001 002001 002002 11 Pop2_4008 , 002001 002001 000000 12 Pop2_4009 , 002001 000000 002001 > ```
zkamvar commented 5 years ago

Hi @CarolineJudy, Could you open the genepop file in a text editor (on mac, it will be TextEdit) and copy+paste the data here? it will be much easier for me to parse it this way.

CarolineJudy commented 5 years ago

Datafile LTP39 LTP42 LTP74 Pop Pop1_4117 , 002001 002001 002002 Pop1_4118 , 001001 002001 002002 Pop1_4119 , 002001 002001 002002 Pop Pop2_4007 , 001001 002001 002002 Pop2_4008 , 002001 002001 000000 Pop2_4009 , 002001 000000 002001

On Tue, Apr 30, 2019 at 10:17 AM Zhian N. Kamvar notifications@github.com wrote:

Hi @CarolineJudy https://github.com/CarolineJudy, Could you open the genepop file in a text editor (on mac, it will be TextEdit) and copy+paste the data here? it will be much easier for me to parse it this way.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/thibautjombart/adegenet/issues/256#issuecomment-487970194, or mute the thread https://github.com/notifications/unsubscribe-auth/AB6YEZ6TSWI4L32MMHQH7YDPTBIJFANCNFSM4HIZVY7A .

zkamvar commented 5 years ago

Hi @CarolineJudy,

It's a bit odd. The behavior you posted seemed like something similar to #160, which was fixed in adegenet 2.0.2, but your output shows the most recent version. However, when I run this on my computer with a clean environment, I get the correct output.

A couple of things I see here, however:

  1. The first test reads in a different file. gdata = read.genepop("~/Documents/Trochilus/Chapter_2/PCA/no_outliers/pruning/Trochilus_Genepop_newcopy.gen", ncode=3)
  2. The output for your second run with adegenet shows the correct solution.
  3. You have a message from R saying that it is restoring your previous R session ([Workspace restored from /Users/carolineduffie/.RData]). This is a tricky thing about R that can give you a lot of headaches in the future with packages and data. I would recommend to never save your workspace data when you exit R-Getting-started#configuring-rstudio) and to delete that .Rdata file (hiding in your home directory)
x <- "Datafile
LTP39
LTP42
LTP74
Pop
Pop1_4117 ,  002001 002001 002002
Pop1_4118 ,  001001 002001 002002
Pop1_4119 ,  002001 002001 002002
Pop
Pop2_4007 ,  001001 002001 002002
Pop2_4008 ,  002001 002001 000000
Pop2_4009 ,  002001 000000 002001"

tmp <- tempfile(fileext = ".gen")
cat(x, file = tmp)

library('adegenet')
#> Loading required package: ade4
#> Registered S3 methods overwritten by 'ggplot2':
#>   method         from 
#>   [.quosures     rlang
#>   c.quosures     rlang
#>   print.quosures rlang
#> Registered S3 method overwritten by 'spdep':
#>   method   from
#>   plot.mst ape
#> 
#>    /// adegenet 2.1.1 is loaded ////////////
#> 
#>    > overview: '?adegenet'
#>    > tutorials/doc/questions: 'adegenetWeb()' 
#>    > bug reports/feature requests: adegenetIssues()
g <- read.genepop(tmp, ncode = 3)
#> 
#>  Converting data from a Genepop .gen file to a genind object... 
#> 
#> 
#> File description:  Datafile 
#> 
#> ...done.
tab(g)
#>           LTP39.002 LTP39.001 LTP42.002 LTP42.001 LTP74.002 LTP74.001
#> Pop1_4117         1         1         1         1         2         0
#> Pop1_4118         0         2         1         1         2         0
#> Pop1_4119         1         1         1         1         2         0
#> Pop2_4007         0         2         1         1         2         0
#> Pop2_4008         1         1         1         1        NA        NA
#> Pop2_4009         1         1        NA        NA         1         1
genind2df(g)
#>                 pop  LTP39  LTP42  LTP74
#> Pop1_4117 Pop1_4119 002001 002001 002002
#> Pop1_4118 Pop1_4119 001001 002001 002002
#> Pop1_4119 Pop1_4119 002001 002001 002002
#> Pop2_4007 Pop2_4009 001001 002001 002002
#> Pop2_4008 Pop2_4009 002001 002001   <NA>
#> Pop2_4009 Pop2_4009 002001   <NA> 002001

Created on 2019-04-30 by the reprex package (v0.2.1)

Session info ``` r devtools::session_info() #> ─ Session info ────────────────────────────────────────────────────────── #> setting value #> version R version 3.6.0 (2019-04-26) #> os Ubuntu 18.04.2 LTS #> system x86_64, linux-gnu #> ui X11 #> language en_US:en #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz Europe/London #> date 2019-04-30 #> #> ─ Packages ────────────────────────────────────────────────────────────── #> package * version date lib source #> ade4 * 1.7-13 2018-08-31 [1] CRAN (R 3.5.1) #> adegenet * 2.1.1 2018-02-02 [1] CRAN (R 3.6.0) #> ape 5.3 2019-03-17 [1] CRAN (R 3.5.3) #> assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.5.3) #> backports 1.1.4 2019-04-10 [1] CRAN (R 3.5.3) #> boot 1.3-22 2019-04-26 [1] CRAN (R 3.6.0) #> callr 3.2.0 2019-03-15 [1] CRAN (R 3.5.3) #> class 7.3-15 2019-01-01 [1] CRAN (R 3.5.2) #> classInt 0.3-3 2019-04-26 [1] CRAN (R 3.6.0) #> cli 1.1.0 2019-03-19 [1] CRAN (R 3.5.3) #> cluster 2.0.8 2019-04-05 [1] CRAN (R 3.5.3) #> coda 0.19-2 2018-10-08 [1] CRAN (R 3.5.1) #> colorspace 1.4-1 2019-03-18 [1] CRAN (R 3.5.3) #> crayon 1.3.4 2017-09-16 [1] CRAN (R 3.5.1) #> DBI 1.0.0 2018-05-02 [1] CRAN (R 3.5.1) #> deldir 0.1-16 2019-01-04 [1] CRAN (R 3.5.2) #> desc 1.2.0 2019-04-12 [1] Github (r-lib/desc@c860e7b) #> devtools 2.0.2 2019-04-08 [1] CRAN (R 3.5.3) #> digest 0.6.18 2018-10-10 [1] CRAN (R 3.5.1) #> dplyr 0.8.0.1 2019-02-15 [1] CRAN (R 3.5.2) #> e1071 1.7-1 2019-03-19 [1] CRAN (R 3.5.3) #> evaluate 0.13 2019-02-12 [1] CRAN (R 3.5.2) #> expm 0.999-4 2019-03-21 [1] CRAN (R 3.5.3) #> fs 1.2.7 2019-03-19 [1] CRAN (R 3.5.3) #> gdata 2.18.0 2017-06-06 [1] CRAN (R 3.5.1) #> ggplot2 3.1.1 2019-04-07 [1] CRAN (R 3.5.3) #> glue 1.3.1 2019-03-12 [1] CRAN (R 3.5.3) #> gmodels 2.18.1 2018-06-25 [1] CRAN (R 3.5.1) #> gtable 0.3.0 2019-03-25 [1] CRAN (R 3.5.3) #> gtools 3.8.1 2018-06-26 [1] CRAN (R 3.5.1) #> highr 0.8 2019-03-20 [1] CRAN (R 3.5.3) #> htmltools 0.3.6 2017-04-28 [1] CRAN (R 3.5.1) #> httpuv 1.5.1 2019-04-05 [1] CRAN (R 3.5.3) #> igraph 1.2.4.1 2019-04-22 [1] CRAN (R 3.6.0) #> KernSmooth 2.23-15 2015-06-29 [1] CRAN (R 3.5.1) #> knitr 1.22 2019-03-08 [1] CRAN (R 3.5.3) #> later 0.8.0 2019-02-11 [1] CRAN (R 3.5.2) #> lattice 0.20-38 2018-11-04 [1] CRAN (R 3.5.1) #> lazyeval 0.2.2 2019-03-15 [1] CRAN (R 3.5.3) #> LearnBayes 2.15.1 2018-03-18 [1] CRAN (R 3.5.1) #> magrittr 1.5 2014-11-22 [1] CRAN (R 3.5.1) #> MASS 7.3-51.4 2019-04-26 [1] CRAN (R 3.6.0) #> Matrix 1.2-17 2019-03-22 [1] CRAN (R 3.5.3) #> memoise 1.1.0 2017-04-21 [1] CRAN (R 3.5.1) #> mgcv 1.8-28 2019-03-21 [1] CRAN (R 3.5.3) #> mime 0.6 2018-10-05 [1] CRAN (R 3.5.1) #> munsell 0.5.0 2018-06-12 [1] CRAN (R 3.5.1) #> nlme 3.1-139 2019-04-09 [1] CRAN (R 3.5.3) #> nvimcom * 0.9-82 2019-04-30 [1] local #> permute 0.9-5 2019-03-12 [1] CRAN (R 3.5.3) #> pillar 1.3.1 2018-12-15 [1] CRAN (R 3.5.2) #> pkgbuild 1.0.3 2019-03-20 [1] CRAN (R 3.5.3) #> pkgconfig 2.0.2 2018-08-16 [1] CRAN (R 3.5.1) #> pkgload 1.0.2 2018-10-29 [1] CRAN (R 3.5.1) #> plyr 1.8.4 2016-06-08 [1] CRAN (R 3.5.1) #> prettyunits 1.0.2 2015-07-13 [1] CRAN (R 3.5.1) #> processx 3.3.0 2019-03-10 [1] CRAN (R 3.5.3) #> promises 1.0.1 2018-04-13 [1] CRAN (R 3.5.1) #> ps 1.3.0 2018-12-21 [1] CRAN (R 3.5.2) #> purrr 0.3.2 2019-03-15 [1] CRAN (R 3.5.3) #> R6 2.4.0 2019-02-14 [1] CRAN (R 3.5.2) #> Rcpp 1.0.1 2019-03-17 [1] CRAN (R 3.5.3) #> remotes 2.0.4 2019-04-10 [1] CRAN (R 3.5.3) #> reshape2 1.4.3 2017-12-11 [1] CRAN (R 3.5.1) #> rlang 0.3.4 2019-04-07 [1] CRAN (R 3.5.3) #> rmarkdown 1.12 2019-03-14 [1] CRAN (R 3.5.3) #> rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.5.1) #> scales 1.0.0 2018-08-09 [1] CRAN (R 3.5.1) #> seqinr 3.4-5 2017-08-01 [1] CRAN (R 3.5.1) #> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.5.1) #> sf 0.7-4 2019-04-25 [1] CRAN (R 3.6.0) #> shiny 1.3.2 2019-04-22 [1] CRAN (R 3.6.0) #> sp 1.3-1 2018-06-05 [1] CRAN (R 3.5.1) #> spData 0.3.0 2019-01-07 [1] CRAN (R 3.5.2) #> spdep 1.1-2 2019-04-05 [1] CRAN (R 3.5.3) #> stringi 1.4.3 2019-03-12 [1] CRAN (R 3.5.3) #> stringr 1.4.0 2019-02-10 [1] CRAN (R 3.5.2) #> testthat 2.1.1 2019-04-23 [1] CRAN (R 3.6.0) #> tibble 2.1.1 2019-03-16 [1] CRAN (R 3.5.3) #> tidyselect 0.2.5 2018-10-11 [1] CRAN (R 3.5.1) #> units 0.6-2 2018-12-05 [1] CRAN (R 3.5.1) #> usethis 1.5.0 2019-04-07 [1] CRAN (R 3.5.3) #> vegan 2.5-4 2019-02-04 [1] CRAN (R 3.5.2) #> withr 2.1.2 2018-03-15 [1] CRAN (R 3.5.1) #> xfun 0.6 2019-04-02 [1] CRAN (R 3.5.3) #> xtable 1.8-4 2019-04-21 [1] CRAN (R 3.6.0) #> yaml 2.2.0 2018-07-25 [1] CRAN (R 3.5.1) #> #> [1] /home/zkamvar/Documents/RLibrary #> [2] /usr/local/lib/R/site-library #> [3] /usr/lib/R/site-library #> [4] /usr/lib/R/library ```
CarolineJudy commented 5 years ago

In reviewing your code, I'm seeing that the error persists. Please confirm. For example, In the first locus/ind - the allelic code should be 002001, but in the genind object, it is displayed as "1" "1". Unless I am missing something, it appears that all allelic/locus codes are wrong except for the ones with missing data 000000, which correctly converts to NA in all four instances.

Oddly, it seems that when you converted the genind back to a dataframe using genind2df, the correct allelic codes are again displayed...

Below, I've copied your generated code - and I've used bold face type for the discrepancies:

> LTP39.002 LTP39.001 LTP42.002 LTP42.001 LTP74.002 LTP74.001

> Pop1_4117 1 1 1 1 2 0

> Pop1_4118 0 2 1 1 2 0

> Pop1_4119 1 1 1 1 2 0

> Pop2_4007 0 2 1 1 2 0

> Pop2_4008 1 1 1 1 NA NA

> Pop2_4009 1 1 NA NA 1 1

genind2df(g)

> pop LTP39 LTP42 LTP74

> Pop1_4117 Pop1_4119 002001 002001 002002

> Pop1_4118 Pop1_4119 001001 002001 002002

> Pop1_4119 Pop1_4119 002001 002001 002002

> Pop2_4007 Pop2_4009 001001 002001 002002

> Pop2_4008 Pop2_4009 002001 002001

> Pop2_4009 Pop2_4009 002001 002001

Regarding your other comments, here are my responses:

1) Yes, I performed a test of the original datafile using the updated R version (R v. 3.6.0) as a control before updating Adegenet to v. 2.1.2. (see R script "TEST 1"). That's why the path/file are different, but the data are the same.

2) I'm not sure what you are referring to by my "second run". Can you specify by copying and pasting my code that you are calling my "second run?" Upon re-review, I see that none of the four tests I performed resulted in a correctly converted genind object. Please see the attached R script for reference to the four tests.

3). Thanks - good tip. I always start my R scripts with "rm(list = ls())" (see line "ENVIRONMENT SETUP) so my workspace is always empty, so I don't think that is the issue here.

Looking forward to hearing your response - this is quite a mystery. Caroline

On Tue, Apr 30, 2019 at 11:22 AM Zhian N. Kamvar notifications@github.com wrote:

Hi @CarolineJudy https://github.com/CarolineJudy,

It's a bit odd. The behavior you posted seemed like something similar to

160 https://github.com/thibautjombart/adegenet/issues/160, which was

fixed in adegenet 2.0.2, but your output shows the most recent version. However, when I run this on my computer with a clean environment, I get the correct output.

A couple of things I see here, however:

  1. The first test reads in a different file. gdata = read.genepop("~/Documents/Trochilus/Chapter_2/PCA/no_outliers/pruning/Trochilus_Genepop_newcopy.gen", ncode=3)
  2. The output for your second run with adegenet shows the correct solution.
  3. You have a message from R saying that it is restoring your previous R session ([Workspace restored from /Users/carolineduffie/.RData]). This is a tricky thing about R that can give you a lot of headaches in the future with packages and data. I would recommend to never save your workspace data when you exit R https://github.com/R4EPI/sitrep/wiki/1)-Getting-started#configuring-rstudio and to delete that .Rdata file (hiding in your home directory)

x <- "Datafile LTP39 LTP42 LTP74 Pop Pop1_4117 , 002001 002001 002002 Pop1_4118 , 001001 002001 002002 Pop1_4119 , 002001 002001 002002 Pop Pop2_4007 , 001001 002001 002002 Pop2_4008 , 002001 002001 000000 Pop2_4009 , 002001 000000 002001"

tmp <- tempfile(fileext = ".gen")

cat(x, file = tmp)

library('adegenet')

> Loading required package: ade4

> Registered S3 methods overwritten by 'ggplot2':

> method from

> [.quosures rlang

> c.quosures rlang

> print.quosures rlang

> Registered S3 method overwritten by 'spdep':

> method from

> plot.mst ape

>

> /// adegenet 2.1.1 is loaded ////////////

>

> > overview: '?adegenet'

> > tutorials/doc/questions: 'adegenetWeb()'

> > bug reports/feature requests: adegenetIssues()

g <- read.genepop(tmp, ncode = 3)

>

> Converting data from a Genepop .gen file to a genind object...

>

>

> File description: Datafile

>

> ...done.

tab(g)

> LTP39.002 LTP39.001 LTP42.002 LTP42.001 LTP74.002 LTP74.001

> Pop1_4117 1 1 1 1 2 0

> Pop1_4118 0 2 1 1 2 0

> Pop1_4119 1 1 1 1 2 0

> Pop2_4007 0 2 1 1 2 0

> Pop2_4008 1 1 1 1 NA NA

> Pop2_4009 1 1 NA NA 1 1

genind2df(g)

> pop LTP39 LTP42 LTP74

> Pop1_4117 Pop1_4119 002001 002001 002002

> Pop1_4118 Pop1_4119 001001 002001 002002

> Pop1_4119 Pop1_4119 002001 002001 002002

> Pop2_4007 Pop2_4009 001001 002001 002002

> Pop2_4008 Pop2_4009 002001 002001

> Pop2_4009 Pop2_4009 002001 002001

Created on 2019-04-30 by the reprex package https://reprex.tidyverse.org (v0.2.1) Session info

devtools::session_info()

> ─ Session info ──────────────────────────────────────────────────────────

> setting value

> version R version 3.6.0 (2019-04-26)

> os Ubuntu 18.04.2 LTS

> system x86_64, linux-gnu

> ui X11

> language en_US:en

> collate en_US.UTF-8

> ctype en_US.UTF-8

> tz Europe/London

> date 2019-04-30

>

> ─ Packages ──────────────────────────────────────────────────────────────

> package * version date lib source

> ade4 * 1.7-13 2018-08-31 [1] CRAN (R 3.5.1)

> adegenet * 2.1.1 2018-02-02 [1] CRAN (R 3.6.0)

> ape 5.3 2019-03-17 [1] CRAN (R 3.5.3)

> assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.5.3)

> backports 1.1.4 2019-04-10 [1] CRAN (R 3.5.3)

> boot 1.3-22 2019-04-26 [1] CRAN (R 3.6.0)

> callr 3.2.0 2019-03-15 [1] CRAN (R 3.5.3)

> class 7.3-15 2019-01-01 [1] CRAN (R 3.5.2)

> classInt 0.3-3 2019-04-26 [1] CRAN (R 3.6.0)

> cli 1.1.0 2019-03-19 [1] CRAN (R 3.5.3)

> cluster 2.0.8 2019-04-05 [1] CRAN (R 3.5.3)

> coda 0.19-2 2018-10-08 [1] CRAN (R 3.5.1)

> colorspace 1.4-1 2019-03-18 [1] CRAN (R 3.5.3)

> crayon 1.3.4 2017-09-16 [1] CRAN (R 3.5.1)

> DBI 1.0.0 2018-05-02 [1] CRAN (R 3.5.1)

> deldir 0.1-16 2019-01-04 [1] CRAN (R 3.5.2)

> desc 1.2.0 2019-04-12 [1] Github (r-lib/desc@c860e7b)

> devtools 2.0.2 2019-04-08 [1] CRAN (R 3.5.3)

> digest 0.6.18 2018-10-10 [1] CRAN (R 3.5.1)

> dplyr 0.8.0.1 2019-02-15 [1] CRAN (R 3.5.2)

> e1071 1.7-1 2019-03-19 [1] CRAN (R 3.5.3)

> evaluate 0.13 2019-02-12 [1] CRAN (R 3.5.2)

> expm 0.999-4 2019-03-21 [1] CRAN (R 3.5.3)

> fs 1.2.7 2019-03-19 [1] CRAN (R 3.5.3)

> gdata 2.18.0 2017-06-06 [1] CRAN (R 3.5.1)

> ggplot2 3.1.1 2019-04-07 [1] CRAN (R 3.5.3)

> glue 1.3.1 2019-03-12 [1] CRAN (R 3.5.3)

> gmodels 2.18.1 2018-06-25 [1] CRAN (R 3.5.1)

> gtable 0.3.0 2019-03-25 [1] CRAN (R 3.5.3)

> gtools 3.8.1 2018-06-26 [1] CRAN (R 3.5.1)

> highr 0.8 2019-03-20 [1] CRAN (R 3.5.3)

> htmltools 0.3.6 2017-04-28 [1] CRAN (R 3.5.1)

> httpuv 1.5.1 2019-04-05 [1] CRAN (R 3.5.3)

> igraph 1.2.4.1 2019-04-22 [1] CRAN (R 3.6.0)

> KernSmooth 2.23-15 2015-06-29 [1] CRAN (R 3.5.1)

> knitr 1.22 2019-03-08 [1] CRAN (R 3.5.3)

> later 0.8.0 2019-02-11 [1] CRAN (R 3.5.2)

> lattice 0.20-38 2018-11-04 [1] CRAN (R 3.5.1)

> lazyeval 0.2.2 2019-03-15 [1] CRAN (R 3.5.3)

> LearnBayes 2.15.1 2018-03-18 [1] CRAN (R 3.5.1)

> magrittr 1.5 2014-11-22 [1] CRAN (R 3.5.1)

> MASS 7.3-51.4 2019-04-26 [1] CRAN (R 3.6.0)

> Matrix 1.2-17 2019-03-22 [1] CRAN (R 3.5.3)

> memoise 1.1.0 2017-04-21 [1] CRAN (R 3.5.1)

> mgcv 1.8-28 2019-03-21 [1] CRAN (R 3.5.3)

> mime 0.6 2018-10-05 [1] CRAN (R 3.5.1)

> munsell 0.5.0 2018-06-12 [1] CRAN (R 3.5.1)

> nlme 3.1-139 2019-04-09 [1] CRAN (R 3.5.3)

> nvimcom * 0.9-82 2019-04-30 [1] local

> permute 0.9-5 2019-03-12 [1] CRAN (R 3.5.3)

> pillar 1.3.1 2018-12-15 [1] CRAN (R 3.5.2)

> pkgbuild 1.0.3 2019-03-20 [1] CRAN (R 3.5.3)

> pkgconfig 2.0.2 2018-08-16 [1] CRAN (R 3.5.1)

> pkgload 1.0.2 2018-10-29 [1] CRAN (R 3.5.1)

> plyr 1.8.4 2016-06-08 [1] CRAN (R 3.5.1)

> prettyunits 1.0.2 2015-07-13 [1] CRAN (R 3.5.1)

> processx 3.3.0 2019-03-10 [1] CRAN (R 3.5.3)

> promises 1.0.1 2018-04-13 [1] CRAN (R 3.5.1)

> ps 1.3.0 2018-12-21 [1] CRAN (R 3.5.2)

> purrr 0.3.2 2019-03-15 [1] CRAN (R 3.5.3)

> R6 2.4.0 2019-02-14 [1] CRAN (R 3.5.2)

> Rcpp 1.0.1 2019-03-17 [1] CRAN (R 3.5.3)

> remotes 2.0.4 2019-04-10 [1] CRAN (R 3.5.3)

> reshape2 1.4.3 2017-12-11 [1] CRAN (R 3.5.1)

> rlang 0.3.4 2019-04-07 [1] CRAN (R 3.5.3)

> rmarkdown 1.12 2019-03-14 [1] CRAN (R 3.5.3)

> rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.5.1)

> scales 1.0.0 2018-08-09 [1] CRAN (R 3.5.1)

> seqinr 3.4-5 2017-08-01 [1] CRAN (R 3.5.1)

> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.5.1)

> sf 0.7-4 2019-04-25 [1] CRAN (R 3.6.0)

> shiny 1.3.2 2019-04-22 [1] CRAN (R 3.6.0)

> sp 1.3-1 2018-06-05 [1] CRAN (R 3.5.1)

> spData 0.3.0 2019-01-07 [1] CRAN (R 3.5.2)

> spdep 1.1-2 2019-04-05 [1] CRAN (R 3.5.3)

> stringi 1.4.3 2019-03-12 [1] CRAN (R 3.5.3)

> stringr 1.4.0 2019-02-10 [1] CRAN (R 3.5.2)

> testthat 2.1.1 2019-04-23 [1] CRAN (R 3.6.0)

> tibble 2.1.1 2019-03-16 [1] CRAN (R 3.5.3)

> tidyselect 0.2.5 2018-10-11 [1] CRAN (R 3.5.1)

> units 0.6-2 2018-12-05 [1] CRAN (R 3.5.1)

> usethis 1.5.0 2019-04-07 [1] CRAN (R 3.5.3)

> vegan 2.5-4 2019-02-04 [1] CRAN (R 3.5.2)

> withr 2.1.2 2018-03-15 [1] CRAN (R 3.5.1)

> xfun 0.6 2019-04-02 [1] CRAN (R 3.5.3)

> xtable 1.8-4 2019-04-21 [1] CRAN (R 3.6.0)

> yaml 2.2.0 2018-07-25 [1] CRAN (R 3.5.1)

>

> [1] /home/zkamvar/Documents/RLibrary

> [2] /usr/local/lib/R/site-library

> [3] /usr/lib/R/site-library

> [4] /usr/lib/R/library

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/thibautjombart/adegenet/issues/256#issuecomment-487995958, or mute the thread https://github.com/notifications/unsubscribe-auth/AB6YEZ5XBHW2WETHQ2H6GB3PTBPZZANCNFSM4HIZVY7A .

zkamvar commented 5 years ago

Here it is:

In the first locus/ind - the allelic code should be 002001, but in the genind object, it is displayed as "1" "1".

This is a common misconception. The numbers in the genind matrix represent allele counts, not the alleles themselves. Each column itself is an allele and the number is how many of each allele an individual has. It's structured that way to make it faster and easier to do things like Principal Components analysis on different types of genetic data.

The first two columns say that you have 1 of allele 002 and 1 of allele 001 for the first sample.

Because you only have two alleles represented for each locus, you get six columns, which makes it look like each column represents an individual DNA strand. If you changed one of the 001 alleles to 003, then you would end up with seven columns in the data set.

I hope that helps and I'm sorry about the confusion.

CarolineJudy commented 5 years ago

Ahh, that is it. What a relief! Many thanks for your help.

On Apr 30, 2019, at 12:52 PM, Zhian N. Kamvar notifications@github.com wrote:

Here it is:

In the first locus/ind - the allelic code should be 002001, but in the genind object, it is displayed as "1" "1".

This is a common misconception. The numbers in the genind matrix represent allele counts, not the alleles themselves. Each column itself is an allele and the number is how many of each allele an individual has. It's structured that way to make it faster and easier to do things like Principal Components analysis on different types of genetic data.

The first two columns say that you have 1 of allele 002 and 1 of allele 001 for the first sample.

Because you only have two alleles represented for each locus, you get six columns, which makes it look like each column represents an individual DNA strand. If you changed one of the 001 alleles to 003, then you would end up with seven columns in the data set.

I hope that helps and I'm sorry about the confusion.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

zkamvar commented 5 years ago

No worries! I'm glad things worked out.