snystrom / memes

An R interface to the MEME Suite
https://snystrom.github.io/memes/
Other
43 stars 5 forks source link

Error in dplyr::mutate() after runMeme #100

Closed mwittep closed 2 years ago

mwittep commented 2 years ago

I have used the command runMeme() to run a MEME analysis from a FASTA file. MEME (v5.4.1) has run, since the output files are produced. However, there seems to be a problem while parsing the exported files, since I get the following error:

Error in `dplyr::mutate()`:
! Problem while computing `sites_hits = purrr::map2(...)`.
Caused by error in `h()`:
! error in evaluating the argument 'x' in selecting a method for function 'mcols': some values in the "end" column cannot be turned into numeric values

I have the latest dplyr version (v1.0.9). Here the sessionInfo:

R version 4.1.2 (2021-11-01)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Monterey 12.3.1

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
character(0)

other attached packages:
[1] memes_1.2.5

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.8.3                lattice_0.20-45             tidyr_1.2.0                 Rsamtools_2.10.0            ps_1.7.0                   
 [6] Biostrings_2.62.0           ggseqlogo_0.1               assertthat_0.2.1            rprojroot_2.0.3             digest_0.6.29              
[11] utf8_1.2.2                  R6_2.5.1                    GenomeInfoDb_1.30.1         plyr_1.8.7                  universalmotif_1.12.4      
[16] stats4_4.1.2                ggplot2_3.3.6               pillar_1.7.0                utils_4.1.2                 zlibbioc_1.40.0            
[21] rlang_1.0.2                 rstudioapi_0.13             S4Vectors_0.32.4            R.utils_2.11.0              R.oo_1.24.0                
[26] Matrix_1.4-1                desc_1.4.1                  labeling_0.4.2              plyranges_1.14.0            BiocParallel_1.28.3        
[31] readr_2.1.2                 cmdfun_1.0.2                RCurl_1.98-1.6              munsell_0.5.0               DelayedArray_0.20.0        
[36] rtracklayer_1.54.0          compiler_4.1.2              pkgconfig_2.0.3             stats_4.1.2                 BiocGenerics_0.40.0        
[41] SummarizedExperiment_1.24.0 tidyselect_1.1.2            tibble_3.1.7                gridExtra_2.3               GenomeInfoDbData_1.2.7     
[46] ggvenn_0.1.9                IRanges_2.28.0              matrixStats_0.62.0          grDevices_4.1.2             XML_3.99-0.9               
[51] fansi_1.0.3                 crayon_1.5.1                dplyr_1.0.9                 tzdb_0.3.0                  withr_2.5.0                
[56] GenomicAlignments_1.30.0    MASS_7.3-57                 bitops_1.0-7                brio_1.1.3                  R.methodsS3_1.8.1          
[61] grid_4.1.2                  gtable_0.3.0                lifecycle_1.0.1             DBI_1.1.2                   magrittr_2.0.3             
[66] datasets_4.1.2              scales_1.2.0                cli_3.3.0                   farver_2.1.0                XVector_0.34.0             
[71] fs_1.5.2                    testthat_3.1.4              ellipsis_0.3.2              graphics_4.1.2              generics_0.1.2             
[76] vctrs_0.4.1                 base_4.1.2                  rjson_0.2.21                restfulr_0.0.13             tools_4.1.2                
[81] Biobase_2.54.0              glue_1.6.2                  purrr_0.3.4                 MatrixGenerics_1.6.0        hms_1.1.1                  
[86] parallel_4.1.2              processx_3.5.3              pkgload_1.2.4               yaml_2.3.5                  colorspace_2.0-3           
[91] BiocManager_1.30.17         UpSetR_1.4.0                GenomicRanges_1.46.1        usethis_2.1.5               BiocIO_1.4.0               
[96] methods_4.1.2              

Nevertheless, the function importMeme works on the exported .txt file without any problem. Which file is parsed by runMeme? the .txt or the XML file?

snystrom commented 2 years ago

How did you prepare the sequences to run Meme? If you used a custom fasta file, I think what is happening is the sequence header isn't matching an assumption made my runMeme by default.

To confirm my suspicions, could you try running importMeme() again but set parse_genomic_coord = TRUE? I expect this will fail. If this is the case, the solution is to set parse_genomic_coord = FALSE when you call runMeme.

The reason this is happening is because runMeme expects by default that the fasta file being used was generated within R using the get_sequences function, which creates a specially formatted fasta header that encodes the genomic coordinates of that sequence. MEME itself will report the position of each match it found as a relative coordinate from the start of each sequence entry. In order to be helpful to the user, I convert these relative coordinates to absolute genomic coordinates using the specially formatted fasta headers. This is explained in ?runMeme under the parse_genomic_coord parameter, which I'll admit is a bit buried.

I will see if I can get memes to spit out a more helpful error message in this situation to hint at turning off genomic coordinate parsing so this cryptic error doesn't happen in the future.

Thanks for reporting, and let me know if changing this flag doesn't solve the issue. I'll try to push a fix in the next few days.

mwittep commented 2 years ago

It works! They were custom fasta files, so it was the exact problem you described. And now that I read the parse_genomic_coord part in detail it was explained there, sorry for overlooking that!

snystrom commented 2 years ago

Glad that was the fix. I've pushed an update to the current release & devel that prints a more informative error when this happens in the future. Thanks again for reporting!