Open Rahimlou opened 2 years ago
I cannot recreate your error on my system (see below). Can you install the latest version of gggenomes? And what OS are you on? Could you send me the output of your sessionInfo()?
> read_gbk("~/Downloads/ncbi-genomes-2022-03-30/GCF_009738455.1_ASM973845v1_genomic.gbff")
writing directives
writing features
Harmonizing attribute names
• ID -> feat_id
• Dbxref -> dbxref
• Parent -> parent_ids
• Name -> name
• EC_number -> ec_number
• Alias -> alias
• ncRNA_class -> nc_rna_class
• seq:cat) -> seq_cat
• seq:gag) -> seq_gag
• seq:ggt) -> seq_ggt
• seq:tgc) -> seq_tgc
• seq:gat) -> seq_gat
• seq:cgg) -> seq_cgg
• seq:gaa) -> seq_gaa
• seq:cag) -> seq_cag
• seq:ctg) -> seq_ctg
• seq:ttg) -> seq_ttg
• seq:tag) -> seq_tag
• seq:gga) -> seq_gga
• seq:tga) -> seq_tga
• seq:gta) -> seq_gta
• seq:tct) -> seq_tct
• seq:tcg) -> seq_tcg
• seq:taa) -> seq_taa
• seq:gca) -> seq_gca
• seq:gcc) -> seq_gcc
• seq:cga) -> seq_cga
• seq:gtt) -> seq_gtt
Features read
# A tibble: 10 × 3
source type n
<chr> <chr> <int>
1 NA CDS 5453
2 NA gene 5589
3 NA misc_feature 30
4 NA ncRNA 7
5 NA region 2
6 NA regulatory 6
7 NA repeat_region 1
8 NA rRNA 22
9 NA tmRNA 1
10 NA tRNA 106
# A tibble: 11,217 × 65
seq_id start end strand type feat_id introns parent_ids source score
<chr> <int> <int> <chr> <chr> <chr> <list> <list> <chr> <chr>
1 NZ_CP046527 1 5.51e6 + regi… region… <NULL> <chr [1]> NA NA
2 NZ_CP046527 1 5.51e6 - gene gene-G… <int> <chr [1]> NA NA
3 NZ_CP046527 1 5.51e6 - CDS cds-GN… <dbl> <chr [1]> NA NA
4 NZ_CP046527 382 5.28e2 - gene gene-G… <int> <chr [1]> NA NA
5 NZ_CP046527 382 5.28e2 - CDS cds-GN… <NULL> <chr [1]> NA NA
6 NZ_CP046527 528 1.10e3 - gene gene-G… <int> <chr [1]> NA NA
7 NZ_CP046527 528 1.10e3 - CDS cds-GN… <NULL> <chr [1]> NA NA
8 NZ_CP046527 1368 1.90e3 - gene gene-G… <int> <chr [1]> NA NA
9 NZ_CP046527 1368 1.90e3 - CDS cds-GN… <NULL> <chr [1]> NA NA
10 NZ_CP046527 1906 2.12e3 - gene gene-G… <int> <chr [1]> NA NA
# … with 11,207 more rows, and 55 more variables: phase <chr>, name <chr>,
# dbxref <chr>, collection_date <chr>, mol_type <chr>, serotype <chr>,
# strain <chr>, organism <chr>, country <chr>, isolation_source <chr>,
# collected_by <chr>, pseudo <chr>, locus_tag <chr>, inference <chr>,
# transl_table <chr>, product <chr>, note <chr>, old_locus_tag <chr>,
# protein_id <chr>, anticodon <chr>, ribosomal_slippage <chr>,
# ec_number <chr>, alias <chr>, rpt_family <chr>, rpt_type <chr>, …
I installed v. 0.9.5.9000. I'm using Windows 10 x64
the sessionInfo():
R version 4.1.2 (2021-11-01) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19042)
Matrix products: default
locale: [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 LC_MONETARY=English_United Kingdom.1252 [4] LC_NUMERIC=C LC_TIME=English_United Kingdom.1252
attached base packages: [1] stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] TreeTools_1.7.1 tidytree_0.3.9 castor_1.7.2 Rcpp_1.0.8 phytools_1.0-1
[6] maps_3.4.0 S4Vectors_0.32.3 BiocGenerics_0.40.0 picante_1.8.2 nlme_3.1-153
[11] vegan_2.5-7 lattice_0.20-45 permute_0.9-7 ape_5.6-2 forcats_0.5.1
[16] tidyverse_1.3.1 gggenomes_0.9.5.9000 snakecase_0.11.0 jsonlite_1.8.0 tibble_3.1.6
[21] thacklr_0.0.0.9000 tidyr_1.2.0 stringr_1.4.0 readr_2.1.2 purrr_0.3.4
[26] gggenes_0.4.1 ggplot2_3.3.5 dplyr_1.0.8
loaded via a namespace (and not attached):
[1] colorspace_2.0-3 ellipsis_0.3.2 rprojroot_2.0.2 fs_1.5.2
[5] rstudioapi_0.13 farver_2.1.0 remotes_2.4.2 ggfittext_0.9.1
[9] bit64_4.0.5 RSpectra_0.16-0 fansi_1.0.2 lubridate_1.8.0
[13] xml2_1.3.3 R.methodsS3_1.8.1 codetools_0.2-18 splines_4.1.2
[17] mnormt_2.0.2 cachem_1.0.6 pkgload_1.2.4 broom_0.7.12
[21] cluster_2.1.2 dbplyr_2.1.1 R.oo_1.24.0 compiler_4.1.2
[25] httr_1.4.2 backports_1.4.1 lazyeval_0.2.2 assertthat_0.2.1
[29] Matrix_1.3-4 fastmap_1.1.0 cli_3.2.0 prettyunits_1.1.1
[33] tools_4.1.2 igraph_1.2.11 coda_0.19-4 gtable_0.3.0
[37] glue_1.6.2 clusterGeneration_1.3.7 fastmatch_1.1-3 cellranger_1.1.0
[41] vctrs_0.3.8 rbibutils_2.2.7 ps_1.6.0 brio_1.1.3
[45] testthat_3.1.2 rvest_1.0.2 lifecycle_1.0.1 phangorn_2.8.1
[49] devtools_2.4.3 MASS_7.3-54 scales_1.1.1 vroom_1.5.7
[53] hms_1.1.1 parallel_4.1.2 expm_0.999-6 RColorBrewer_1.1-2
[57] curl_4.3.2 memoise_2.0.1 yulab.utils_0.0.4 naturalsort_0.1.3
[61] stringi_1.7.6 desc_1.4.1 plotrix_3.8-2 pkgbuild_1.3.1
[65] Rdpack_2.3 rlang_1.0.2 pkgconfig_2.0.3 labeling_0.4.2
[69] bit_4.0.4 processx_3.5.2 tidyselect_1.1.2 magrittr_2.0.2
[73] R6_2.5.1 IRanges_2.28.0 generics_0.1.2 combinat_0.0-8
[77] DBI_1.1.2 pillar_1.7.0 haven_2.4.3 withr_2.5.0
[81] mgcv_1.8-38 scatterplot3d_0.3-41 modelr_0.1.8 crayon_1.5.0
[85] utf8_1.2.2 tmvnsim_1.0-2 tzdb_0.2.0 usethis_2.1.5
[89] grid_4.1.2 readxl_1.3.1 callr_3.7.0 reprex_2.0.1
[93] digest_0.6.29 R.cache_0.15.0 numDeriv_2016.8-1.1 R.utils_2.11.0
[97] munsell_0.5.0 sessioninfo_1.2.2 quadprog_1.5-8
I also get 10 warning messages when loading "gggenomes":
Warning messages: 1: replacing previous import ‘purrr::invoke’ by ‘rlang::invoke’ when loading ‘gggenomes’ 2: replacing previous import ‘purrr::flatten_raw’ by ‘rlang::flatten_raw’ when loading ‘gggenomes’ 3: replacing previous import ‘purrr::as_function’ by ‘rlang::as_function’ when loading ‘gggenomes’ 4: replacing previous import ‘purrr::flatten_dbl’ by ‘rlang::flatten_dbl’ when loading ‘gggenomes’ 5: replacing previous import ‘purrr::flatten_lgl’ by ‘rlang::flatten_lgl’ when loading ‘gggenomes’ 6: replacing previous import ‘purrr::flatten_int’ by ‘rlang::flatten_int’ when loading ‘gggenomes’ 7: replacing previous import ‘purrr::%@%’ by ‘rlang::%@%’ when loading ‘gggenomes’ 8: replacing previous import ‘purrr::flatten_chr’ by ‘rlang::flatten_chr’ when loading ‘gggenomes’ 9: replacing previous import ‘purrr::splice’ by ‘rlang::splice’ when loading ‘gggenomes’ 10: replacing previous import ‘purrr::flatten’ by ‘rlang::flatten’ when loading ‘gggenomes’
The problem is running R on Windows. I tried to run R on Linux using the docker container and the function read_gbk() worked well.
Thanks for the info. I haven't enough time to take the package out on Windows...
ecoli_genes <- read_feats("GCA_009738455.1_ASM973845v1_genomic.gbff") ecoli_genes <- read_gbk("GCA_009738455.1_ASM973845v1_genomic.gbff")
The above functions are not working. Return the following error: Harmonizing attribute names Error in
left_join()
: ! Can't join onx$feat_id
xy$feat_id
because of incompatible types. ix$feat_id
is of type logical. iy$feat_id
is of type character.