thackl / gggenomes

A grammar of graphics for comparative genomics
https://thackl.github.io/gggenomes/
Other
579 stars 64 forks source link

gbff files cannot be imported #118

Open Rahimlou opened 2 years ago

Rahimlou commented 2 years ago

ecoli_genes <- read_feats("GCA_009738455.1_ASM973845v1_genomic.gbff") ecoli_genes <- read_gbk("GCA_009738455.1_ASM973845v1_genomic.gbff")

The above functions are not working. Return the following error: Harmonizing attribute names Error in left_join(): ! Can't join on x$feat_id x y$feat_id because of incompatible types. i x$feat_id is of type logical. i y$feat_id is of type character.

thackl commented 2 years ago

I cannot recreate your error on my system (see below). Can you install the latest version of gggenomes? And what OS are you on? Could you send me the output of your sessionInfo()?

> read_gbk("~/Downloads/ncbi-genomes-2022-03-30/GCF_009738455.1_ASM973845v1_genomic.gbff")
writing directives
writing features
Harmonizing attribute names                                                   
• ID -> feat_id
• Dbxref -> dbxref
• Parent -> parent_ids
• Name -> name
• EC_number -> ec_number
• Alias -> alias
• ncRNA_class -> nc_rna_class
• seq:cat) -> seq_cat
• seq:gag) -> seq_gag
• seq:ggt) -> seq_ggt
• seq:tgc) -> seq_tgc
• seq:gat) -> seq_gat
• seq:cgg) -> seq_cgg
• seq:gaa) -> seq_gaa
• seq:cag) -> seq_cag
• seq:ctg) -> seq_ctg
• seq:ttg) -> seq_ttg
• seq:tag) -> seq_tag
• seq:gga) -> seq_gga
• seq:tga) -> seq_tga
• seq:gta) -> seq_gta
• seq:tct) -> seq_tct
• seq:tcg) -> seq_tcg
• seq:taa) -> seq_taa
• seq:gca) -> seq_gca
• seq:gcc) -> seq_gcc
• seq:cga) -> seq_cga
• seq:gtt) -> seq_gtt
Features read
# A tibble: 10 × 3
   source type              n
   <chr>  <chr>         <int>
 1 NA     CDS            5453
 2 NA     gene           5589
 3 NA     misc_feature     30
 4 NA     ncRNA             7
 5 NA     region            2
 6 NA     regulatory        6
 7 NA     repeat_region     1
 8 NA     rRNA             22
 9 NA     tmRNA             1
10 NA     tRNA            106
# A tibble: 11,217 × 65
   seq_id      start    end strand type  feat_id introns parent_ids source score
   <chr>       <int>  <int> <chr>  <chr> <chr>   <list>  <list>     <chr>  <chr>
 1 NZ_CP046527     1 5.51e6 +      regi… region… <NULL>  <chr [1]>  NA     NA   
 2 NZ_CP046527     1 5.51e6 -      gene  gene-G… <int>   <chr [1]>  NA     NA   
 3 NZ_CP046527     1 5.51e6 -      CDS   cds-GN… <dbl>   <chr [1]>  NA     NA   
 4 NZ_CP046527   382 5.28e2 -      gene  gene-G… <int>   <chr [1]>  NA     NA   
 5 NZ_CP046527   382 5.28e2 -      CDS   cds-GN… <NULL>  <chr [1]>  NA     NA   
 6 NZ_CP046527   528 1.10e3 -      gene  gene-G… <int>   <chr [1]>  NA     NA   
 7 NZ_CP046527   528 1.10e3 -      CDS   cds-GN… <NULL>  <chr [1]>  NA     NA   
 8 NZ_CP046527  1368 1.90e3 -      gene  gene-G… <int>   <chr [1]>  NA     NA   
 9 NZ_CP046527  1368 1.90e3 -      CDS   cds-GN… <NULL>  <chr [1]>  NA     NA   
10 NZ_CP046527  1906 2.12e3 -      gene  gene-G… <int>   <chr [1]>  NA     NA   
# … with 11,207 more rows, and 55 more variables: phase <chr>, name <chr>,
#   dbxref <chr>, collection_date <chr>, mol_type <chr>, serotype <chr>,
#   strain <chr>, organism <chr>, country <chr>, isolation_source <chr>,
#   collected_by <chr>, pseudo <chr>, locus_tag <chr>, inference <chr>,
#   transl_table <chr>, product <chr>, note <chr>, old_locus_tag <chr>,
#   protein_id <chr>, anticodon <chr>, ribosomal_slippage <chr>,
#   ec_number <chr>, alias <chr>, rpt_family <chr>, rpt_type <chr>, …
Rahimlou commented 2 years ago

I installed v. 0.9.5.9000. I'm using Windows 10 x64

the sessionInfo():

R version 4.1.2 (2021-11-01) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale: [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 LC_MONETARY=English_United Kingdom.1252 [4] LC_NUMERIC=C LC_TIME=English_United Kingdom.1252

attached base packages: [1] stats4 stats graphics grDevices utils datasets methods base

other attached packages: [1] TreeTools_1.7.1 tidytree_0.3.9 castor_1.7.2 Rcpp_1.0.8 phytools_1.0-1
[6] maps_3.4.0 S4Vectors_0.32.3 BiocGenerics_0.40.0 picante_1.8.2 nlme_3.1-153
[11] vegan_2.5-7 lattice_0.20-45 permute_0.9-7 ape_5.6-2 forcats_0.5.1
[16] tidyverse_1.3.1 gggenomes_0.9.5.9000 snakecase_0.11.0 jsonlite_1.8.0 tibble_3.1.6
[21] thacklr_0.0.0.9000 tidyr_1.2.0 stringr_1.4.0 readr_2.1.2 purrr_0.3.4
[26] gggenes_0.4.1 ggplot2_3.3.5 dplyr_1.0.8

loaded via a namespace (and not attached): [1] colorspace_2.0-3 ellipsis_0.3.2 rprojroot_2.0.2 fs_1.5.2
[5] rstudioapi_0.13 farver_2.1.0 remotes_2.4.2 ggfittext_0.9.1
[9] bit64_4.0.5 RSpectra_0.16-0 fansi_1.0.2 lubridate_1.8.0
[13] xml2_1.3.3 R.methodsS3_1.8.1 codetools_0.2-18 splines_4.1.2
[17] mnormt_2.0.2 cachem_1.0.6 pkgload_1.2.4 broom_0.7.12
[21] cluster_2.1.2 dbplyr_2.1.1 R.oo_1.24.0 compiler_4.1.2
[25] httr_1.4.2 backports_1.4.1 lazyeval_0.2.2 assertthat_0.2.1
[29] Matrix_1.3-4 fastmap_1.1.0 cli_3.2.0 prettyunits_1.1.1
[33] tools_4.1.2 igraph_1.2.11 coda_0.19-4 gtable_0.3.0
[37] glue_1.6.2 clusterGeneration_1.3.7 fastmatch_1.1-3 cellranger_1.1.0
[41] vctrs_0.3.8 rbibutils_2.2.7 ps_1.6.0 brio_1.1.3
[45] testthat_3.1.2 rvest_1.0.2 lifecycle_1.0.1 phangorn_2.8.1
[49] devtools_2.4.3 MASS_7.3-54 scales_1.1.1 vroom_1.5.7
[53] hms_1.1.1 parallel_4.1.2 expm_0.999-6 RColorBrewer_1.1-2
[57] curl_4.3.2 memoise_2.0.1 yulab.utils_0.0.4 naturalsort_0.1.3
[61] stringi_1.7.6 desc_1.4.1 plotrix_3.8-2 pkgbuild_1.3.1
[65] Rdpack_2.3 rlang_1.0.2 pkgconfig_2.0.3 labeling_0.4.2
[69] bit_4.0.4 processx_3.5.2 tidyselect_1.1.2 magrittr_2.0.2
[73] R6_2.5.1 IRanges_2.28.0 generics_0.1.2 combinat_0.0-8
[77] DBI_1.1.2 pillar_1.7.0 haven_2.4.3 withr_2.5.0
[81] mgcv_1.8-38 scatterplot3d_0.3-41 modelr_0.1.8 crayon_1.5.0
[85] utf8_1.2.2 tmvnsim_1.0-2 tzdb_0.2.0 usethis_2.1.5
[89] grid_4.1.2 readxl_1.3.1 callr_3.7.0 reprex_2.0.1
[93] digest_0.6.29 R.cache_0.15.0 numDeriv_2016.8-1.1 R.utils_2.11.0
[97] munsell_0.5.0 sessioninfo_1.2.2 quadprog_1.5-8

Rahimlou commented 2 years ago

I also get 10 warning messages when loading "gggenomes":

Warning messages: 1: replacing previous import ‘purrr::invoke’ by ‘rlang::invoke’ when loading ‘gggenomes’ 2: replacing previous import ‘purrr::flatten_raw’ by ‘rlang::flatten_raw’ when loading ‘gggenomes’ 3: replacing previous import ‘purrr::as_function’ by ‘rlang::as_function’ when loading ‘gggenomes’ 4: replacing previous import ‘purrr::flatten_dbl’ by ‘rlang::flatten_dbl’ when loading ‘gggenomes’ 5: replacing previous import ‘purrr::flatten_lgl’ by ‘rlang::flatten_lgl’ when loading ‘gggenomes’ 6: replacing previous import ‘purrr::flatten_int’ by ‘rlang::flatten_int’ when loading ‘gggenomes’ 7: replacing previous import ‘purrr::%@%’ by ‘rlang::%@%’ when loading ‘gggenomes’ 8: replacing previous import ‘purrr::flatten_chr’ by ‘rlang::flatten_chr’ when loading ‘gggenomes’ 9: replacing previous import ‘purrr::splice’ by ‘rlang::splice’ when loading ‘gggenomes’ 10: replacing previous import ‘purrr::flatten’ by ‘rlang::flatten’ when loading ‘gggenomes’

Rahimlou commented 2 years ago

The problem is running R on Windows. I tried to run R on Linux using the docker container and the function read_gbk() worked well.

thackl commented 2 years ago

Thanks for the info. I haven't enough time to take the package out on Windows...