thackl / gggenomes

A grammar of graphics for comparative genomics
https://thackl.github.io/gggenomes/
Other
579 stars 64 forks source link

read_feats() fails on reading gff files - need to @import dplyr #96

Closed bfrommer closed 2 years ago

bfrommer commented 2 years ago

Hello,

I have problems with loading my own gff3 files and also the emales example gff files. When I execute

g0 <- read_feats(ex("emales/emales.gff"))

I get

Reading 'gff3' with `read_gff3()`:
* file_id: emales [/.../gggenomes/extdata/emales/emales.gff]
Harmonizing attribute names                                                                                                                                 
* ID -> feat_id
* Name -> name
* Note -> note
Error: Problem with `summarise()` column `parent_ids`.
ℹ `parent_ids = list(first(parent_ids))`.
x unable to find an inherited method for function ‘first’ for signature ‘"list"’
ℹ The error occurred in group 1: type = "CDS", feat_id = "BVI_008A_0001".
Run `rlang::last_error()` to see where the error occurred.

The same happens also with directly using read_gff3(). I hope you can help me. Thanks!

thackl commented 2 years ago

This looks somewhat familiar but I don't remember exactly what the issue was. Can you send your sessionInfo() output?

Also, if you just type first in the command line, which namespace does first belong to?

bfrommer commented 2 years ago

Here is my sessionInfo() output:

R version 4.1.1 (2021-08-10)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Ubuntu 20.04.3 LTS

Matrix products: default
BLAS/LAPACK: /prj/gf-biotools/programs/miniconda3/envs/r_topGo/lib/libopenblasp-r0.3.17.so

locale:
 [1] LC_CTYPE=de_DE.UTF-8       LC_NUMERIC=C               LC_TIME=de_DE.UTF-8        LC_COLLATE=de_DE.UTF-8    
 [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=de_DE.UTF-8    LC_PAPER=de_DE.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] rtracklayer_1.52.1   GenomicRanges_1.44.0 GenomeInfoDb_1.28.4  IRanges_2.26.0       S4Vectors_0.30.2    
 [6] BiocGenerics_0.38.0  gggenomes_0.9.4.9000 snakecase_0.11.0     jsonlite_1.7.2       tibble_3.1.5        
[11] thacklr_0.0.0.9000   tidyr_1.1.4          stringr_1.4.0        readr_2.0.2          purrr_0.3.4         
[16] gggenes_0.4.1        ggplot2_3.3.5        dplyr_1.0.7         

loaded via a namespace (and not attached):
 [1] matrixStats_0.61.0          bitops_1.0-7                fs_1.5.0                    usethis_2.1.3              
 [5] devtools_2.4.2              bit64_4.0.5                 RColorBrewer_1.1-2          rprojroot_2.0.2            
 [9] tools_4.1.1                 utf8_1.2.2                  R6_2.5.1                    DBI_1.1.1                  
[13] colorspace_2.0-2            withr_2.4.2                 tidyselect_1.1.1            prettyunits_1.1.1          
[17] processx_3.5.2              bit_4.0.4                   curl_4.3.2                  compiler_4.1.1             
[21] git2r_0.28.0                Biobase_2.52.0              cli_3.1.0                   DelayedArray_0.18.0        
[25] desc_1.4.0                  labeling_0.4.2              scales_1.1.1                callr_3.7.0                
[29] digest_0.6.28               Rsamtools_2.8.0             XVector_0.32.0              pkgconfig_2.0.3            
[33] sessioninfo_1.2.1           MatrixGenerics_1.4.3        fastmap_1.1.0               rlang_0.4.12               
[37] rstudioapi_0.13             BiocIO_1.2.0                farver_2.1.0                generics_0.1.1             
[41] BiocParallel_1.26.2         vroom_1.5.5                 RCurl_1.98-1.5              magrittr_2.0.1             
[45] GenomeInfoDbData_1.2.6      Matrix_1.3-4                munsell_0.5.0               fansi_0.5.0                
[49] ggfittext_0.9.1             lifecycle_1.0.1             stringi_1.7.5               yaml_2.2.1                 
[53] SummarizedExperiment_1.22.0 zlibbioc_1.38.0             pkgbuild_1.2.0              grid_4.1.1                 
[57] crayon_1.4.2                lattice_0.20-45             Biostrings_2.60.2           hms_1.1.1                  
[61] ps_1.6.0                    pillar_1.6.4                rjson_0.2.20                pkgload_1.2.3              
[65] XML_3.99-0.8                glue_1.4.2                  remotes_2.4.1               BiocManager_1.30.16        
[69] vctrs_0.3.8                 tzdb_0.2.0                  testthat_3.1.0              gtable_0.3.0               
[73] assertthat_0.2.1            cachem_1.0.6                restfulr_0.0.13             GenomicAlignments_1.28.0   
[77] memoise_2.0.0               ellipsis_0.3.2  

When typing in first, this is my first line:

standardGeneric for "first" defined from package "S4Vectors"
thackl commented 2 years ago

Ah, ok. Try first <- dplyr::first just before reading the gff as a workaround. I think that should do the trick.

If that works I will fix read_gff to explicitly to dplyr::first in the next release.

bfrommer commented 2 years ago

Yes, now this error is gone! But it gives me an other error?

> first <- dplyr::first
> g0 <- read_feats(ex("emales/emales.gff"))
Reading 'gff3' with `read_gff3()`:
* file_id: emales [/.../gggenomes/extdata/emales/emales.gff]
Harmonizing attribute names                                                                                                                                 
* ID -> feat_id
* Name -> name
* Note -> note
Error in rename(mrna_introns, mrna_introns.. = introns) : 
  Object 'introns' not found
thackl commented 2 years ago

OK, seems related to the same namespace collision. Can you restart R and load S4Vectors first?

library(S4Vectors)
library(gggenomes)
read_feats(ex("emales/emales.gff"))
bfrommer commented 2 years ago

Thanks! It works just fine.

thackl commented 2 years ago

Great!

I'm reopening and renaming this issue until I fixed it in the package