thierrygosselin / radiator

RADseq Data Exploration, Manipulation and Visualization using R
https://thierrygosselin.github.io/radiator/
GNU General Public License v3.0
59 stars 23 forks source link

genomic_converter & read_vcf - Error: Can't recycle `ID_SEQ` #119

Closed rlfrench closed 3 years ago

rlfrench commented 3 years ago

Hello Thierry,

Describe the bug I'm having difficulty reading a vcf file into R using read_vcf and converting my file to bayescan format using genomic_converter. In both cases, I get the same error message related to a size incompatibility (see full description of the error message below). I generated the vcf file in Stacks 2.3e then filtered it in vcftools, and I'm working with Mac OS 10.14.5. I noticed that the same error message is mentioned here: https://github.com/thierrygosselin/radiator/issues/109, but I think the error I'm getting will require a different solution because bayescan is implemented as an output option for genomic_converter.

To Reproduce

I tried to read my vcf (2200 SNPs, 49 individuals) into R using:

cic_2200_rad <- read_vcf("pops.recode.vcf") 

And I got the following error:

################################################################################
############################## radiator::read_vcf ##############################
################################################################################
Execution date@time: 20210206@0934
Folder created: read_vcf_20210206@0934
Function call and arguments stored in: radiator_read_vcf_args_20210206@0934.tsv
File written: random.seed (91616)

Reading VCF... 

Data summary: 
    number of samples: 49
    number of markers: 2200

Read time: 0 sec

GDS file written: radiator_20210206@0934.gds

Analyzing the vcf...
VCF source: Stacks v2.3e
Data is bi-allelic
Reads assembly: de novo
Filters parameters file generated: filters_parameters_20210206@0934.tsv

Filter monomorphic markers
Number of individuals / strata / chrom / locus / SNP:
    Blacklisted: 0 / 0 / 0 / 0 / 0
Filter common markers: only 1 strata, returning data

Preparing output files...
File written: whitelist.markers.tsv
File written: strata.filtered.tsv

Generating individual stats...
Error: Can't recycle `ID_SEQ` (size 107800) to match `length` (size 2200).
Run `rlang::last_error()` to see where the error occurred.
In addition: Warning message:
Outer names are only allowed for unnamed scalar atomic inputs 

Computation time, overall: 2 sec
############################## completed read_vcf ##############################

I checked rlang::last_error() and saw the following:

<error/vctrs_error_incompatible_size>
Can't recycle `ID_SEQ` (size 107800) to match `length` (size 2200).
Backtrace:
  1. radiator::read_vcf("pops.recode.vcf")
 19. vctrs::stop_incompatible_size(...)
 20. vctrs:::stop_incompatible(...)
 21. vctrs:::stop_vctrs(...)
Run `rlang::last_trace()` to see the full context. 

I then tried importing my data using the vcf.R function from vcfR:

cic_2200 <-read.vcfR("pops.recode.vcf") 

And that was successful.

After my unsuccessful attempts to read in my file using read_vcf, I tried to use genomic_converter to convert my data from vcf format to bayescan format:

cic_49 <- genomic_converter(
  data = "pops.recode.vcf",
  strata = "STRATA_Bayescan_n49_popsRpca_2021.02.06.txt",
  output = "bayescan",
  filename = "cic_49_strataRpca_bayescan")

Again, I got an error message related to size incompatibility:

################################################################################
######################### radiator::genomic_converter ##########################
################################################################################
Execution date@time: 20210206@1011
Folder created: 03_radiator_genomic_converter_20210206@1011
Function call and arguments stored in: radiator_genomic_converter_args_20210206@1011.tsv
Filters parameters file generated: filters_parameters_20210206@1011.tsv

Importing data: vcf.file

Reading VCF... 

Data summary: 
    number of samples: 49
    number of markers: 2200

Filter monomorphic markers
Number of individuals / strata / chrom / locus / SNP:
    Blacklisted: 0 / 0 / 0 / 0 / 0

Filter common markers:
Number of individuals / strata / chrom / locus / SNP:
    Blacklisted: 0 / 0 / 0 / 0 / 0

Generating individual stats...
Error: Can't recycle `ID_SEQ` (size 107800) to match `length` (size 2200).
Run `rlang::last_error()` to see where the error occurred.
In addition: Warning message:
Outer names are only allowed for unnamed scalar atomic inputs 

Computation time, overall: 4 sec

Computation time, overall: 4 sec
######################### completed genomic_converter ##########################

My devtools::session_info() is as follows:

─ Session info ─────────────────────────────────────────────────────────
 setting  value                       
 version  R version 3.6.2 (2019-12-12)
 os       macOS Mojave 10.14.5        
 system   x86_64, darwin15.6.0        
 ui       RStudio                     
 language (EN)                        
 collate  en_CA.UTF-8                 
 ctype    en_CA.UTF-8                 
 tz       America/Edmonton            
 date     2021-02-06                  

─ Packages ─────────────────────────────────────────────────────────────
 package          * version  date       lib
 ade4             * 1.7-15   2020-02-13 [1]
 adegenet         * 2.1.2    2020-01-20 [1]
 ape                5.3      2019-03-17 [1]
 assertthat         0.2.1    2019-03-21 [1]
 backports          1.1.5    2019-10-02 [1]
 BiocGenerics       0.32.0   2019-10-29 [1]
 Biostrings         2.54.0   2019-10-29 [1]
 bitops             1.0-6    2013-08-17 [1]
 bold               1.1.0    2020-06-17 [1]
 boot               1.3-23   2019-07-05 [1]
 callr              3.4.0    2019-12-09 [1]
 class              7.3-15   2019-01-01 [1]
 classInt           0.4-3    2020-04-07 [1]
 cli                2.0.1    2020-01-08 [1]
 cluster            2.1.0    2019-06-19 [1]
 coda               0.19-3   2019-07-05 [1]
 codetools          0.2-16   2018-12-24 [1]
 colorspace         1.4-1    2019-03-18 [1]
 conditionz         0.1.0    2019-04-24 [1]
 crayon             1.3.4    2017-09-16 [1]
 crul               1.0.0    2020-07-30 [1]
curl               4.3      2019-12-02 [1]
 data.table         1.12.2   2019-04-07 [1]
 DBI                1.1.0    2019-12-15 [1]
 deldir             0.1-25   2020-02-03 [1]
 desc               1.2.0    2018-05-01 [1]
 deSolve            1.28     2020-03-08 [1]
 devtools           2.2.1    2019-09-24 [1]
 digest             0.6.24   2020-02-12 [1]
 dismo            * 1.1-4    2017-01-09 [1]
 diversitree        0.9-15   2020-11-24 [1]
 dplyr            * 1.0.2    2020-08-18 [1]
 e1071              1.7-3    2019-11-26 [1]
 ellipsis           0.3.0    2019-09-20 [1]
 expm               0.999-4  2019-03-21 [1]
 fansi              0.4.1    2020-01-08 [1]
 farver             2.0.3    2020-01-16 [1]
 fastmap            1.0.1    2019-10-08 [1]
 foreach            1.4.7    2019-07-27 [1]
 foreign            0.8-72   2019-08-02 [1]
 fs                 1.3.1    2019-05-06 [1]
 gdata              2.18.0   2017-06-06 [1]
 gdsfmt             1.20.0   2019-05-02 [1]
 generics           0.0.2    2018-11-29 [1]
 GenomeInfoDb       1.20.0   2019-05-02 [1]
 GenomeInfoDbData   1.2.1    2021-02-06 [1]
 GenomicRanges      1.36.1   2019-09-06 [1]
 geoaxe             0.1.0    2016-02-19 [1]
 ggplot2            3.2.1    2019-08-10 [1]
 githubinstall      0.2.2    2018-02-18 [1]
 glue               1.4.2    2020-08-27 [1]
 gmodels            2.18.1   2018-06-25 [1]
 gridExtra          2.3      2017-09-09 [1]
 gtable             0.3.0    2019-03-2httpuv             1.5.2    2019-09-11 [1]
 httr               1.4.2    2020-07-20 [1]
 igraph             1.2.4.2  2019-11-27 [1]
 inline             0.3.15   2018-05-18 [1]
 IRanges            2.20.2   2020-01-13 [1]
 iterators          1.0.12   2019-07-26 [1]
 jsonlite           1.6.1    2020-02-02 [1]
 KernSmooth         2.23-16  2019-10-15 [1]
 labeling           0.3      2014-08-23 [1]
 later              1.0.0    2019-10-04 [1]
 lattice            0.20-38  2018-11-04 [1]
 lazyeval           0.2.2    2019-03-15 [1]
 LearnBayes         2.15.1   2018-03-18 [1]
 lifecycle          0.2.0    2020-03-06 [1]
 loo                2.2.0    2019-12-19 [1]
 magrittr           1.5      2014-11-22 [1]
 maptools         * 0.9-5    2019-02-18 [1]
 MASS               7.3-51.4 2019-03-31 [1]
 Matrix             1.2-18   2019-11-27 [1]
 matrixStats        0.55.0   2019-09-07 [1]
 memoise            1.1.0    2017-04-21 [1]
 memuse             4.0-0    2017-11-10 [1]
 mgcv               1.8-31   2019-11-09 [1]
 mime               0.9      2020-02-04 [1]
 munsell            0.5.0    2018-06-12 [1]
 mvtnorm            1.0-11   2019-06-19 [1]
 nlme               3.1-142  2019-11-07 [1]
 oai                0.3.0    2019-09-07 [1]
 PBSmapping       * 2.73.0   2021-01-13 [1]
 permute            0.9-5    2019-03-12 [1]
 pillar             1.4.3    2019-12-20 [1]
 pinfsc50           1.1.0    2016-12-02 [1]
 pkgbuild           1.0.6    2019-10-09 [1]
 pkgconfig          2.0.3    2019-09-22 [1]
 pkgload            1.0.2    2018-10-29 [1]
 plyr               1.8.5    2019-12-10 [1]
 prettyunits        1.0.2    2015-07-13 [1]
 processx           3.4.1    2019-07-18 [1]
 promises           1.1.0    2019-10-04 [1]
 ps                 1.3.0    2018-12-21 [1]
 purrr            * 0.3.3    2019-10-18 [1]
 R6                 2.4.1    2019-11-12 [1]
 radiator         * 1.1.9    2021-02-06 [1]
 raster           * 3.0-12   2020-01-30 [1]
 RColorBrewer     * 1.1-2    2014-12-07 [1]
 Rcpp               1.0.3    2019-11-08 [1]
 RCurl              1.98-1.2 2020-04-18 [1]
 readr              1.4.0    2020-10-05 [1]
 remotes            2.1.0    2019-06-24 [1]
 reshape            0.8.8    2018-10-23 [1]
 reshape2         * 1.4.3    2017-12-11 [1]
 rethinking         2.00     2020-04-12 [1]
 rgbif            * 3.4.0    2020-12-03 [1]
 rgdal            * 1.4-4    2019-05-29 [1]
 rgeos            * 0.5-5    2020-09-07 [1]
 rlang              0.4.7    2020-07-09 [1]
 rprojroot          1.3-2    2018-01-03 [1]
 rstan              2.19.3   2020-02-11 [1]
 rstudioapi         0.10     2019-03-19 [1]
 S4Vectors          0.24.4   2020-04-09 [1]
 scales             1.1.0    2019-11-18 [1]
 SeqArray           1.24.2   2019-07-12 [1]
 seqinr             4.2-4    2020-10-10 [1]
 sessioninfo        1.1.1    2018-11-05 [1]
 sf               * 0.9-3    2020-05-04 [1]
 shape              1.4.4    2018-02-07 [1]
 shiny              1.4.0    2019-10-10 [1]
 sp               * 1.4-0    2020-02-21 [1]
 spData             0.3.3    2020-02-11 [1]
 spdep              1.1-3    2019-09-18 [1]
 StanHeaders        2.19.0   2019-09-07 [1]
 stringi            1.4.6    2020-02-17 [1]
 stringr            1.4.0    2019-02-10 [1]
 subplex            1.6      2020-02-23 [1]
 taxize           * 0.9.99   2020-10-30 [1]
 testthat           2.1.1    2019-04-23 [1]
 tibble             3.0.4    2020-10-12 [1]
 tidyr            * 1.1.2    2020-08-27 [1]
 tidyselect         1.1.0    2020-05-11 [1]
 units              0.6-5    2019-10-08 [1]
 UpSetR             1.4.0    2019-05-22 [1]
 usethis            1.5.1    2019-07-04 [1]
 uuid               0.1-2    2015-07-28 [1]
 vcfR             * 1.8.0    2018-04-17 [1]
 vctrs              0.3.4    2020-08-29 [1]
 vegan              2.5-6    2019-09-01 [1]
 viridisLite        0.3.0    2018-02-01 [1]
 whisker            0.3-2    2013-04-28 [1]
 withr              2.1.2    2018-03-15 [1]
 xml2               1.3.2    2020-04-23 [1]
 xtable             1.8-4    2019-04-21 [1]
 XVector            0.26.0   2019-10-29 [1]
 zlibbioc           1.32.0   2019-10-29 [1]
 zoo                1.8-7    2020-01-10 [1]
 source                                   
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.0)                           
 Bioconductor                             
 Bioconductor                             
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.2)                           
 CRAN (R 3.6.2)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.2)                           
 CRAN (R 3.6.2)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.2)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.2)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.2)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.2)                           
 CRAN (R 3.6.2)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.2)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.0)                           
 Bioconductor                             
 CRAN (R 3.6.0)                           
 Bioconductor                             
 Bioconductor                             
 Bioconductor                             
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.2)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.2)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.2)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.0)                           
 Bioconductor                             
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.2)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.2)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.2)                           
 CRAN (R 3.6.2)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.2)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.2)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.2)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.0)                           
 Github (thierrygosselin/radiator@5c6b865)
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.2)                           
 CRAN (R 3.6.2)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.0)                           
 Github (rmcelreath/rethinking@f393f30)   
 CRAN (R 3.6.2)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.2)                           
 CRAN (R 3.6.2)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.0)                           
 Bioconductor                             
 CRAN (R 3.6.0)                           
 Bioconductor                             
 CRAN (R 3.6.2)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.2)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.2)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.2)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.2)                           
 CRAN (R 3.6.2)                           
 CRAN (R 3.6.2)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.2)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.0)                           
 CRAN (R 3.6.2)                           
 CRAN (R 3.6.0)                           
 Bioconductor                             
 Bioconductor                             
 CRAN (R 3.6.0)                           

[1] /Library/Frameworks/R.framework/Versions/3.6/Resources/library

I have attached the first 40 lines of my vcf file here: vcf_subset.txt

Carol-Symbiomics commented 3 years ago

Same here! I'm also getting the same error while trying to read a vcf file using tidy_genomic_data function image

dylanHco commented 3 years ago

I will add to this, I am also getting the same error as the people above. My vcf file comes from ipyrad's output files. Is there a problem with vcf from ipyrad?

-Dylan

thierrygosselin commented 3 years ago

try with latest release please if still relevant