Closed CarolineJudy closed 5 years ago
Hi @CarolineJudy,
I agree that this is unsettling. To help with this we'll need a couple of things from you:
A workaround for this may be to import your genepop data into GenAlEx (excel macro) and then read it into R with the read.genalex()
function from poppr.
Thanks for your help. I've attached a subset datafile ("genepop.subset.test.gen") below. I performed the conversion using a newer build of R (3.6.0) and the github version of Adegenet (2.1.2). The problem persists.
See attached mini datafile, R script for testing, and R Console output.
I first noticed the problem when trying to use the utility "genind2db". That outputted an unusable format for downstream input into other programs (structure). The number of columns (loci) were different across individuals, where there should the same number -- each column having an entry corresponding to a locus-specific bilallelic code, or an NA if no allelic code is present for a given locus. I mentioned that here because I'm not sure if the problem is with both utilities "read.genepop" and "genind2db" or just "read.genepop", which creates a faulty genind and thus, downstream problems with export using "genind2db".
Thanks again, Caroline
On Tue, Apr 30, 2019 at 5:46 AM Zhian N. Kamvar notifications@github.com wrote:
Hi @CarolineJudy https://github.com/CarolineJudy,
I agree that this is unsettling. To help with this we'll need a couple of things from you:
- Can you construct a minimal genepop file from your data (e.g. 2 populations of three and three loci)?
- Do you get this error when you use the github version of adegenet?
A workaround for this may be to import your genepop data into GenAlEx (excel macro) and then read it into R with the read.genalex() function from poppr https://grunwaldlab.github.io/poppr/reference/read.genalex.html.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/thibautjombart/adegenet/issues/256#issuecomment-487891162, or mute the thread https://github.com/notifications/unsubscribe-auth/AB6YEZ4Z5BLW2HUACE24NJTPTAIQFANCNFSM4HIZVY7A .
Hi @CarolineJudy, Could you open the genepop file in a text editor (on mac, it will be TextEdit) and copy+paste the data here? it will be much easier for me to parse it this way.
Datafile LTP39 LTP42 LTP74 Pop Pop1_4117 , 002001 002001 002002 Pop1_4118 , 001001 002001 002002 Pop1_4119 , 002001 002001 002002 Pop Pop2_4007 , 001001 002001 002002 Pop2_4008 , 002001 002001 000000 Pop2_4009 , 002001 000000 002001
On Tue, Apr 30, 2019 at 10:17 AM Zhian N. Kamvar notifications@github.com wrote:
Hi @CarolineJudy https://github.com/CarolineJudy, Could you open the genepop file in a text editor (on mac, it will be TextEdit) and copy+paste the data here? it will be much easier for me to parse it this way.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/thibautjombart/adegenet/issues/256#issuecomment-487970194, or mute the thread https://github.com/notifications/unsubscribe-auth/AB6YEZ6TSWI4L32MMHQH7YDPTBIJFANCNFSM4HIZVY7A .
Hi @CarolineJudy,
It's a bit odd. The behavior you posted seemed like something similar to #160, which was fixed in adegenet 2.0.2, but your output shows the most recent version. However, when I run this on my computer with a clean environment, I get the correct output.
A couple of things I see here, however:
gdata = read.genepop("~/Documents/Trochilus/Chapter_2/PCA/no_outliers/pruning/Trochilus_Genepop_newcopy.gen", ncode=3)
x <- "Datafile
LTP39
LTP42
LTP74
Pop
Pop1_4117 , 002001 002001 002002
Pop1_4118 , 001001 002001 002002
Pop1_4119 , 002001 002001 002002
Pop
Pop2_4007 , 001001 002001 002002
Pop2_4008 , 002001 002001 000000
Pop2_4009 , 002001 000000 002001"
tmp <- tempfile(fileext = ".gen")
cat(x, file = tmp)
library('adegenet')
#> Loading required package: ade4
#> Registered S3 methods overwritten by 'ggplot2':
#> method from
#> [.quosures rlang
#> c.quosures rlang
#> print.quosures rlang
#> Registered S3 method overwritten by 'spdep':
#> method from
#> plot.mst ape
#>
#> /// adegenet 2.1.1 is loaded ////////////
#>
#> > overview: '?adegenet'
#> > tutorials/doc/questions: 'adegenetWeb()'
#> > bug reports/feature requests: adegenetIssues()
g <- read.genepop(tmp, ncode = 3)
#>
#> Converting data from a Genepop .gen file to a genind object...
#>
#>
#> File description: Datafile
#>
#> ...done.
tab(g)
#> LTP39.002 LTP39.001 LTP42.002 LTP42.001 LTP74.002 LTP74.001
#> Pop1_4117 1 1 1 1 2 0
#> Pop1_4118 0 2 1 1 2 0
#> Pop1_4119 1 1 1 1 2 0
#> Pop2_4007 0 2 1 1 2 0
#> Pop2_4008 1 1 1 1 NA NA
#> Pop2_4009 1 1 NA NA 1 1
genind2df(g)
#> pop LTP39 LTP42 LTP74
#> Pop1_4117 Pop1_4119 002001 002001 002002
#> Pop1_4118 Pop1_4119 001001 002001 002002
#> Pop1_4119 Pop1_4119 002001 002001 002002
#> Pop2_4007 Pop2_4009 001001 002001 002002
#> Pop2_4008 Pop2_4009 002001 002001 <NA>
#> Pop2_4009 Pop2_4009 002001 <NA> 002001
Created on 2019-04-30 by the reprex package (v0.2.1)
In reviewing your code, I'm seeing that the error persists. Please confirm. For example, In the first locus/ind - the allelic code should be 002001, but in the genind object, it is displayed as "1" "1". Unless I am missing something, it appears that all allelic/locus codes are wrong except for the ones with missing data 000000, which correctly converts to NA in all four instances.
Oddly, it seems that when you converted the genind back to a dataframe using genind2df, the correct allelic codes are again displayed...
Below, I've copied your generated code - and I've used bold face type for the discrepancies:
genind2df(g)
Regarding your other comments, here are my responses:
1) Yes, I performed a test of the original datafile using the updated R version (R v. 3.6.0) as a control before updating Adegenet to v. 2.1.2. (see R script "TEST 1"). That's why the path/file are different, but the data are the same.
2) I'm not sure what you are referring to by my "second run". Can you specify by copying and pasting my code that you are calling my "second run?" Upon re-review, I see that none of the four tests I performed resulted in a correctly converted genind object. Please see the attached R script for reference to the four tests.
3). Thanks - good tip. I always start my R scripts with "rm(list = ls())" (see line "ENVIRONMENT SETUP) so my workspace is always empty, so I don't think that is the issue here.
Looking forward to hearing your response - this is quite a mystery. Caroline
On Tue, Apr 30, 2019 at 11:22 AM Zhian N. Kamvar notifications@github.com wrote:
Hi @CarolineJudy https://github.com/CarolineJudy,
It's a bit odd. The behavior you posted seemed like something similar to
160 https://github.com/thibautjombart/adegenet/issues/160, which was
fixed in adegenet 2.0.2, but your output shows the most recent version. However, when I run this on my computer with a clean environment, I get the correct output.
A couple of things I see here, however:
- The first test reads in a different file. gdata = read.genepop("~/Documents/Trochilus/Chapter_2/PCA/no_outliers/pruning/Trochilus_Genepop_newcopy.gen", ncode=3)
- The output for your second run with adegenet shows the correct solution.
- You have a message from R saying that it is restoring your previous R session ([Workspace restored from /Users/carolineduffie/.RData]). This is a tricky thing about R that can give you a lot of headaches in the future with packages and data. I would recommend to never save your workspace data when you exit R https://github.com/R4EPI/sitrep/wiki/1)-Getting-started#configuring-rstudio and to delete that .Rdata file (hiding in your home directory)
x <- "Datafile LTP39 LTP42 LTP74 Pop Pop1_4117 , 002001 002001 002002 Pop1_4118 , 001001 002001 002002 Pop1_4119 , 002001 002001 002002 Pop Pop2_4007 , 001001 002001 002002 Pop2_4008 , 002001 002001 000000 Pop2_4009 , 002001 000000 002001"
tmp <- tempfile(fileext = ".gen")
cat(x, file = tmp)
library('adegenet')
> Loading required package: ade4
> Registered S3 methods overwritten by 'ggplot2':
> method from
> [.quosures rlang
> c.quosures rlang
> print.quosures rlang
> Registered S3 method overwritten by 'spdep':
> method from
> plot.mst ape
>
> /// adegenet 2.1.1 is loaded ////////////
>
> > overview: '?adegenet'
> > tutorials/doc/questions: 'adegenetWeb()'
> > bug reports/feature requests: adegenetIssues()
g <- read.genepop(tmp, ncode = 3)
>
> Converting data from a Genepop .gen file to a genind object...
>
>
> File description: Datafile
>
> ...done.
tab(g)
> LTP39.002 LTP39.001 LTP42.002 LTP42.001 LTP74.002 LTP74.001
> Pop1_4117 1 1 1 1 2 0
> Pop1_4118 0 2 1 1 2 0
> Pop1_4119 1 1 1 1 2 0
> Pop2_4007 0 2 1 1 2 0
> Pop2_4008 1 1 1 1 NA NA
> Pop2_4009 1 1 NA NA 1 1
genind2df(g)
> pop LTP39 LTP42 LTP74
> Pop1_4117 Pop1_4119 002001 002001 002002
> Pop1_4118 Pop1_4119 001001 002001 002002
> Pop1_4119 Pop1_4119 002001 002001 002002
> Pop2_4007 Pop2_4009 001001 002001 002002
> Pop2_4008 Pop2_4009 002001 002001
> Pop2_4009 Pop2_4009 002001
002001 Created on 2019-04-30 by the reprex package https://reprex.tidyverse.org (v0.2.1) Session info
devtools::session_info()
> ─ Session info ──────────────────────────────────────────────────────────
> setting value
> version R version 3.6.0 (2019-04-26)
> os Ubuntu 18.04.2 LTS
> system x86_64, linux-gnu
> ui X11
> language en_US:en
> collate en_US.UTF-8
> ctype en_US.UTF-8
> tz Europe/London
> date 2019-04-30
>
> ─ Packages ──────────────────────────────────────────────────────────────
> package * version date lib source
> ade4 * 1.7-13 2018-08-31 [1] CRAN (R 3.5.1)
> adegenet * 2.1.1 2018-02-02 [1] CRAN (R 3.6.0)
> ape 5.3 2019-03-17 [1] CRAN (R 3.5.3)
> assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.5.3)
> backports 1.1.4 2019-04-10 [1] CRAN (R 3.5.3)
> boot 1.3-22 2019-04-26 [1] CRAN (R 3.6.0)
> callr 3.2.0 2019-03-15 [1] CRAN (R 3.5.3)
> class 7.3-15 2019-01-01 [1] CRAN (R 3.5.2)
> classInt 0.3-3 2019-04-26 [1] CRAN (R 3.6.0)
> cli 1.1.0 2019-03-19 [1] CRAN (R 3.5.3)
> cluster 2.0.8 2019-04-05 [1] CRAN (R 3.5.3)
> coda 0.19-2 2018-10-08 [1] CRAN (R 3.5.1)
> colorspace 1.4-1 2019-03-18 [1] CRAN (R 3.5.3)
> crayon 1.3.4 2017-09-16 [1] CRAN (R 3.5.1)
> DBI 1.0.0 2018-05-02 [1] CRAN (R 3.5.1)
> deldir 0.1-16 2019-01-04 [1] CRAN (R 3.5.2)
> desc 1.2.0 2019-04-12 [1] Github (r-lib/desc@c860e7b)
> devtools 2.0.2 2019-04-08 [1] CRAN (R 3.5.3)
> digest 0.6.18 2018-10-10 [1] CRAN (R 3.5.1)
> dplyr 0.8.0.1 2019-02-15 [1] CRAN (R 3.5.2)
> e1071 1.7-1 2019-03-19 [1] CRAN (R 3.5.3)
> evaluate 0.13 2019-02-12 [1] CRAN (R 3.5.2)
> expm 0.999-4 2019-03-21 [1] CRAN (R 3.5.3)
> fs 1.2.7 2019-03-19 [1] CRAN (R 3.5.3)
> gdata 2.18.0 2017-06-06 [1] CRAN (R 3.5.1)
> ggplot2 3.1.1 2019-04-07 [1] CRAN (R 3.5.3)
> glue 1.3.1 2019-03-12 [1] CRAN (R 3.5.3)
> gmodels 2.18.1 2018-06-25 [1] CRAN (R 3.5.1)
> gtable 0.3.0 2019-03-25 [1] CRAN (R 3.5.3)
> gtools 3.8.1 2018-06-26 [1] CRAN (R 3.5.1)
> highr 0.8 2019-03-20 [1] CRAN (R 3.5.3)
> htmltools 0.3.6 2017-04-28 [1] CRAN (R 3.5.1)
> httpuv 1.5.1 2019-04-05 [1] CRAN (R 3.5.3)
> igraph 1.2.4.1 2019-04-22 [1] CRAN (R 3.6.0)
> KernSmooth 2.23-15 2015-06-29 [1] CRAN (R 3.5.1)
> knitr 1.22 2019-03-08 [1] CRAN (R 3.5.3)
> later 0.8.0 2019-02-11 [1] CRAN (R 3.5.2)
> lattice 0.20-38 2018-11-04 [1] CRAN (R 3.5.1)
> lazyeval 0.2.2 2019-03-15 [1] CRAN (R 3.5.3)
> LearnBayes 2.15.1 2018-03-18 [1] CRAN (R 3.5.1)
> magrittr 1.5 2014-11-22 [1] CRAN (R 3.5.1)
> MASS 7.3-51.4 2019-04-26 [1] CRAN (R 3.6.0)
> Matrix 1.2-17 2019-03-22 [1] CRAN (R 3.5.3)
> memoise 1.1.0 2017-04-21 [1] CRAN (R 3.5.1)
> mgcv 1.8-28 2019-03-21 [1] CRAN (R 3.5.3)
> mime 0.6 2018-10-05 [1] CRAN (R 3.5.1)
> munsell 0.5.0 2018-06-12 [1] CRAN (R 3.5.1)
> nlme 3.1-139 2019-04-09 [1] CRAN (R 3.5.3)
> nvimcom * 0.9-82 2019-04-30 [1] local
> permute 0.9-5 2019-03-12 [1] CRAN (R 3.5.3)
> pillar 1.3.1 2018-12-15 [1] CRAN (R 3.5.2)
> pkgbuild 1.0.3 2019-03-20 [1] CRAN (R 3.5.3)
> pkgconfig 2.0.2 2018-08-16 [1] CRAN (R 3.5.1)
> pkgload 1.0.2 2018-10-29 [1] CRAN (R 3.5.1)
> plyr 1.8.4 2016-06-08 [1] CRAN (R 3.5.1)
> prettyunits 1.0.2 2015-07-13 [1] CRAN (R 3.5.1)
> processx 3.3.0 2019-03-10 [1] CRAN (R 3.5.3)
> promises 1.0.1 2018-04-13 [1] CRAN (R 3.5.1)
> ps 1.3.0 2018-12-21 [1] CRAN (R 3.5.2)
> purrr 0.3.2 2019-03-15 [1] CRAN (R 3.5.3)
> R6 2.4.0 2019-02-14 [1] CRAN (R 3.5.2)
> Rcpp 1.0.1 2019-03-17 [1] CRAN (R 3.5.3)
> remotes 2.0.4 2019-04-10 [1] CRAN (R 3.5.3)
> reshape2 1.4.3 2017-12-11 [1] CRAN (R 3.5.1)
> rlang 0.3.4 2019-04-07 [1] CRAN (R 3.5.3)
> rmarkdown 1.12 2019-03-14 [1] CRAN (R 3.5.3)
> rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.5.1)
> scales 1.0.0 2018-08-09 [1] CRAN (R 3.5.1)
> seqinr 3.4-5 2017-08-01 [1] CRAN (R 3.5.1)
> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.5.1)
> sf 0.7-4 2019-04-25 [1] CRAN (R 3.6.0)
> shiny 1.3.2 2019-04-22 [1] CRAN (R 3.6.0)
> sp 1.3-1 2018-06-05 [1] CRAN (R 3.5.1)
> spData 0.3.0 2019-01-07 [1] CRAN (R 3.5.2)
> spdep 1.1-2 2019-04-05 [1] CRAN (R 3.5.3)
> stringi 1.4.3 2019-03-12 [1] CRAN (R 3.5.3)
> stringr 1.4.0 2019-02-10 [1] CRAN (R 3.5.2)
> testthat 2.1.1 2019-04-23 [1] CRAN (R 3.6.0)
> tibble 2.1.1 2019-03-16 [1] CRAN (R 3.5.3)
> tidyselect 0.2.5 2018-10-11 [1] CRAN (R 3.5.1)
> units 0.6-2 2018-12-05 [1] CRAN (R 3.5.1)
> usethis 1.5.0 2019-04-07 [1] CRAN (R 3.5.3)
> vegan 2.5-4 2019-02-04 [1] CRAN (R 3.5.2)
> withr 2.1.2 2018-03-15 [1] CRAN (R 3.5.1)
> xfun 0.6 2019-04-02 [1] CRAN (R 3.5.3)
> xtable 1.8-4 2019-04-21 [1] CRAN (R 3.6.0)
> yaml 2.2.0 2018-07-25 [1] CRAN (R 3.5.1)
>
> [1] /home/zkamvar/Documents/RLibrary
> [2] /usr/local/lib/R/site-library
> [3] /usr/lib/R/site-library
> [4] /usr/lib/R/library
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/thibautjombart/adegenet/issues/256#issuecomment-487995958, or mute the thread https://github.com/notifications/unsubscribe-auth/AB6YEZ5XBHW2WETHQ2H6GB3PTBPZZANCNFSM4HIZVY7A .
Here it is:
In the first locus/ind - the allelic code should be 002001, but in the genind object, it is displayed as "1" "1".
This is a common misconception. The numbers in the genind matrix represent allele counts, not the alleles themselves. Each column itself is an allele and the number is how many of each allele an individual has. It's structured that way to make it faster and easier to do things like Principal Components analysis on different types of genetic data.
The first two columns say that you have 1 of allele 002 and 1 of allele 001 for the first sample.
Because you only have two alleles represented for each locus, you get six columns, which makes it look like each column represents an individual DNA strand. If you changed one of the 001 alleles to 003, then you would end up with seven columns in the data set.
I hope that helps and I'm sorry about the confusion.
Ahh, that is it. What a relief! Many thanks for your help.
On Apr 30, 2019, at 12:52 PM, Zhian N. Kamvar notifications@github.com wrote:
Here it is:
In the first locus/ind - the allelic code should be 002001, but in the genind object, it is displayed as "1" "1".
This is a common misconception. The numbers in the genind matrix represent allele counts, not the alleles themselves. Each column itself is an allele and the number is how many of each allele an individual has. It's structured that way to make it faster and easier to do things like Principal Components analysis on different types of genetic data.
The first two columns say that you have 1 of allele 002 and 1 of allele 001 for the first sample.
Because you only have two alleles represented for each locus, you get six columns, which makes it look like each column represents an individual DNA strand. If you changed one of the 001 alleles to 003, then you would end up with seven columns in the data set.
I hope that helps and I'm sorry about the confusion.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
No worries! I'm glad things worked out.
Hi Thibaut, I have a discrepancy between my original datafile (genepop format) and my genind object created using the Adegenet function "read.genepop" -- the alleles are not the same after the conversion.
Original genepop datafile read in by Adegenet [subset for first three individuals, and first three loci]
Adegenet created Genind object
I'm attaching the R console output for your review. I didn't see any notes in bugs reported for the "read.genepop" function.
Many thanks for your assistance, Caroline
R Console.txt