snystrom / memes

An R interface to the MEME Suite
https://snystrom.github.io/memes/
Other
43 stars 5 forks source link

.html was not found. #108

Open eladzis opened 1 year ago

eladzis commented 1 year ago

Hello, I'm trying to run runStreme but the function returns an error when it tries to find the html file.

Error in error_file_not_exist(.) : /mnt/c/Users/folder1/streme.html was not found.

I checked and the streme.xml, streme.txt and sequence.tsv files were created.

also I checked installation and it seems fine:

checking main install ✔ /home/eladzis/meme/bin checking util installs ✔ /home/eladzis/meme/bin/dreme ✔ /home/eladzis/meme/bin/ame ✔ /home/eladzis/meme/bin/fimo ✔ /home/eladzis/meme/bin/tomtom ✔ /home/eladzis/meme/bin/meme ✔ /home/eladzis/meme/bin/streme --   > | > >

Can you explain why the function doesn't create the html file? Thanks, Elad

snystrom commented 1 year ago

Hi, this is usually because of missing a system dependency, but to help me debug, could you try rerunning runStreme but set silent = FALSE and post the full output from your terminal?

So,

runStreme(your settings here, silent = FALSE)

Thanks.

eladzis commented 1 year ago

This is the error: The STREME XML file specified does not exist Usage: streme_xml_to_html

Warning: streme_xml_to_html exited abnormally and may have failed to create HTML output.

Freeing storage...

Warning: p-values will be inaccurate if primary and control

Warning: streme_xml_to_html exited abnormally and may have failed to create HTML output.

Error in error_file_not_exist(.) : /mnt/c/Users/folder1/streme.html was not found.

But the xml file does exist.

snystrom commented 1 year ago

I'm having a hard time with the formatting of this message. Could you please post the code you ran, as well as the error message inside of a code block (put the text inside of "``" see: https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/creating-and-highlighting-code-blocks)

I think this is an error from STREME and not the R package, but I think there is information missing from the message so far.

snystrom commented 1 year ago

Can you also post the output of sessionInfo() inside a code block?

eladzis commented 1 year ago
# FINDBESTMODEL 0.00 seconds    (cumulative 365.96 seconds)
# Erasing 113 positive and 42 negative matches to the motif from seed: AGAGGAAGAGTA
The STREME XML file specified does not exist
Usage:
    streme_xml_to_html <STREME XML file> <STREME HTML file>

# Warning: streme_xml_to_html exited abnormally and may have failed to create HTML output.
# Freeing storage...

Warning: p-values will be inaccurate if primary and control

Warning: streme_xml_to_html exited abnormally and may have failed to create HTML output.
Error in error_file_not_exist(.) : 
/mnt/c/Users/folder1/streme.html was not found.

Before this error message there is this type of messages:

#   Refining evaluated seed 32: AAATGTCACTTCTTC log_pvalue -10.17 width 15 w0 13
#     ITER 1 pvalue 1.9e-006 pos 19 neg 0 score_threshold 17.773
#     ITER 2 pvalue 9.6e-007 pos 20 neg 0 score_threshold 16.176
#   CAND-31-32 w 15 w0 13 'AAATGTCACTTCTTC' 'AAATGTCACTTCTDM' sd_pos 7 sd_neg 0 sd_pv -4.85 tr_pos 20 tr_neg 0 tr_bern 0.500077 tr_pv 9.6e-007 tr_rat 21.0 test_pos 2 test_neg 0 test_bern 0.500077 test_pv 2.5e-001 test_rat 3.0
#     ITER 1 pvalue 1.5e-005 pos 52 neg 17 score_threshold 11.293
#     ITER 2 pvalue 4.4e-006 pos 44 neg 11 score_threshold 11.998
# Testing palindromic version of model: pal log_pvalue -12.340 non-pal log_pvalue -20.998 log_pvalue_ratio 0.588 ED 11.757...
# Using non-palindromic version of final model...
#  Refining final model: 'AGAGGAAGAGTA' 'AGAGGAAGAGTA'
# REFINESEEDS   0.00 seconds    (cumulative 365.96 seconds)
# BEST-31 'AGAGGAAGAGTA' 'AGAGGAAGAGTA' w 12 start_w 9 pos_count 118 neg_count 42 train_pvalue 7.6e-010 train_ratio 2.8 test_pos 9 test_neg 5 test_pvalue 2.1e-001 test_ratio 1.7 score_thr 11.34

The sessionInfo() result:

R version 4.3.0 (2023-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0 
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
 [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
 [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
[10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

time zone: Asia/Jerusalem
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] lubridate_1.9.2   forcats_1.0.0     stringr_1.5.0     dplyr_1.1.2      
 [5] purrr_1.0.1       readr_2.1.4       tidyr_1.3.0       tibble_3.2.1     
 [9] ggplot2_3.4.2     tidyverse_2.0.0   data.table_1.14.8 memes_1.9.0      

loaded via a namespace (and not attached):
 [1] gtable_0.3.3            xfun_0.39               processx_3.8.1         
 [4] tzdb_0.4.0              ps_1.7.5                vctrs_0.6.2            
 [7] tools_4.3.0             bitops_1.0-7            generics_0.1.3         
[10] stats4_4.3.0            fansi_1.0.4             pkgconfig_2.0.3        
[13] R.oo_1.25.0             desc_1.4.2              S4Vectors_0.39.1       
[16] lifecycle_1.0.3         GenomeInfoDbData_1.2.10 compiler_4.3.0         
[19] Biostrings_2.69.1       brio_1.1.3              munsell_0.5.0          
[22] GenomeInfoDb_1.37.1     htmltools_0.5.5         RCurl_1.98-1.12        
[25] yaml_2.3.7              pillar_1.9.0            crayon_1.5.2           
[28] R.utils_2.12.2          tidyselect_1.2.0        digest_0.6.31          
[31] stringi_1.7.12          rprojroot_2.0.3         fastmap_1.1.1          
[34] grid_4.3.0              colorspace_2.1-0        cli_3.6.1              
[37] magrittr_2.0.3          utf8_1.2.3              withr_2.5.0            
[40] scales_1.2.1            timechange_0.2.0        rmarkdown_2.21         
[43] XVector_0.41.1          matrixStats_0.63.0      cmdfun_1.0.2           
[46] R.methodsS3_1.8.2       hms_1.1.3               evaluate_0.21          
[49] knitr_1.42              GenomicRanges_1.53.1    IRanges_2.35.1         
[52] testthat_3.1.8          rlang_1.1.1             glue_1.6.2             
[55] BiocGenerics_0.47.0     pkgload_1.3.2           rstudioapi_0.14        
[58] R6_2.5.1                zlibbioc_1.47.0         ggseqlogo_0.1   

Thanks

snystrom commented 1 year ago

Ah, perfect thanks so much this makes a lot more sense now.

This looks like that warning could be a bug with STREME directly: you could try posting about it on the Meme Suite Q&A Forum.

However, like you mentioned, the xml file exists on your machine, and memes only needs the xml. I check that all expected outputs are returned (including the xml) because sometimes when one is missing it indicates an issue with the meme software, but this could be relaxed. I'll think about this. It will be a little bit before I can push a fix here for you (day job doesn't pay me to work on this anymore, but I will get to it).

What you can do in the mean time is run your command then manually point to the xml file and read the data with importStremeXML(). I know it's not ideal, but this will get you something you can use today.

cparsania commented 2 months ago

I have similar error with the function runDreme while running example data. See the log below

dreme_results <- runDreme(sequences, control = "shuffle", e = 50,silent = FALSE)
Reading positive sequences /tmp/Rtmp7QX5rG/file239a6a6bf8dec9.fa ...
Shuffling positive sequences...
Looking for motif 1...
Counting positive sequences with each word...
Counting negative sequences with each word...
Applying Fisher's Exact Test to 5065 words...
Generalizing top 100 of 5065 REs to 1 ambiguous characters...
Computing exact p-values for 0 REs...
Best RE was AGAGC GCTCT p-value= 1.5e-003 E-value= 7.8e+000 Unerased_E-value= 7.8e+000
Erasing best word (AGAGC GCTCT)...
Looking for motif 2...
Counting positive sequences with each word...
Counting negative sequences with each word...
Applying Fisher's Exact Test to 4849 words...
Generalizing top 100 of 4849 REs to 1 ambiguous characters...
Computing exact p-values for 0 REs...
Best RE was AAATGG CCATTT p-value= 1.6e-002 E-value= 7.9e+001 Unerased_E-value= 7.9e+001
Stopping due to hitting the maximum evalue.
1 motifs with E-value < 50 found in 0.2 seconds.
Creating HTML file.
Creating text file.

Error in error_file_not_exist(.) : 
  /tmp/Rtmp7QX5rG/file239a6a6bf8dec9_vs_shuffle/dreme.txt was not found.
/tmp/Rtmp7QX5rG/file239a6a6bf8dec9_vs_shuffle/dreme.html was not found.
> sessionInfo()
R version 4.2.2 Patched (2022-11-10 r83330)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8      
 [8] LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
 [1] memes_1.6.0                 org.Hs.eg.db_3.16.0         AnnotationDbi_1.60.0        purrr_1.0.1                 SummarizedExperiment_1.28.0 Biobase_2.58.0             
 [7] GenomicRanges_1.50.2        GenomeInfoDb_1.34.9         IRanges_2.32.0              S4Vectors_0.36.1            BiocGenerics_0.44.0         MatrixGenerics_1.10.0      
[13] matrixStats_0.63.0          ggplot2_3.4.0               magrittr_2.0.3             

loaded via a namespace (and not attached):
  [1] ggvenn_0.1.9                             utf8_1.2.3                               R.utils_2.12.2                           tidyselect_1.2.0                        
  [5] RSQLite_2.2.20                           grid_4.2.2                               BiocParallel_1.32.5                      scatterpie_0.1.8                        
  [9] munsell_0.5.0                            codetools_0.2-19                         future_1.33.1                            withr_2.5.0                             
 [13] colorspace_2.0-3                         GOSemSim_2.24.0                          TxDb.Hsapiens.UCSC.hg38.knownGene_3.16.0 filelock_1.0.2                          
 [17] knitr_1.43                               rstudioapi_0.15.0                        DOSE_3.24.2                              listenv_0.9.1                           
 [21] GenomeInfoDbData_1.2.9                   polyclip_1.10-4                          bit64_4.0.5                              farver_2.1.1                            
 [25] rprojroot_2.0.3                          treeio_1.22.0                            parallelly_1.37.1                        vctrs_0.6.5                             
 [29] generics_0.1.3                           xfun_0.40                                BiocFileCache_2.6.0                      ggseqlogo_0.2                           
 [33] R6_2.5.1                                 doParallel_1.0.17                        clue_0.3-63                              graphlayouts_0.8.4                      
 [37] locfit_1.5-9.7                           bitops_1.0-7                             cachem_1.0.8                             fgsea_1.24.0                            
 [41] gridGraphics_0.5-1                       DelayedArray_0.24.0                      assertthat_0.2.1                         vroom_1.6.0                             
 [45] BiocIO_1.8.0                             scales_1.2.1                             ggraph_2.1.0                             enrichplot_1.18.3                       
 [49] gtable_0.3.1                             globals_0.16.3                           processx_3.8.2                           tidygraph_1.2.2                         
 [53] tictoc_1.2.1                             rlang_1.1.3                              GlobalOptions_0.1.2                      splines_4.2.2                           
 [57] lazyeval_0.2.2                           rtracklayer_1.58.0                       plyranges_1.18.0                         BiocManager_1.30.22                     
 [61] yaml_2.3.7                               reshape2_1.4.4                           GenomicFeatures_1.50.4                   qvalue_2.30.0                           
 [65] EnrichedHeatmap_1.28.1                   usethis_2.2.2                            tools_4.2.2                              ggplotify_0.1.0                         
 [69] gplots_3.1.3                             ellipsis_0.3.2                           RColorBrewer_1.1-3                       Rcpp_1.0.9                              
 [73] plyr_1.8.8                               progress_1.2.2                           zlibbioc_1.44.0                          RCurl_1.98-1.9                          
 [77] ps_1.7.5                                 prettyunits_1.1.1                        GetoptLong_1.0.5                         viridis_0.6.2                           
 [81] cowplot_1.1.1                            ggrepel_0.9.2                            cluster_2.1.4                            fs_1.5.2                                
 [85] furrr_0.3.1                              magick_2.8.3                             data.table_1.14.6                        circlize_0.4.16                         
 [89] parcutils_0.1.0                          pkgload_1.3.2                            hms_1.1.2                                patchwork_1.1.2                         
 [93] HDO.db_0.99.1                            XML_3.99-0.13                            gridExtra_2.3                            shape_1.4.6                             
 [97] testthat_3.1.6                           compiler_4.2.2                           biomaRt_2.54.1                           tibble_3.2.1                            
[101] KernSmooth_2.23-20                       shadowtext_0.1.2                         writexl_1.4.2                            crayon_1.5.2                            
[105] R.oo_1.25.0                              ggfun_0.0.9                              tzdb_0.3.0                               tidyr_1.2.1                             
[109] aplot_0.1.9                              DBI_1.1.3                                BSgenome.Dmelanogaster.UCSC.dm6_1.4.1    tweenr_2.0.2                            
[113] ChIPseeker_1.34.1                        dbplyr_2.3.0                             ComplexHeatmap_2.14.0                    MASS_7.3-58.3                           
[117] rappdirs_0.3.3                           boot_1.3-28                              Matrix_1.5-1                             readr_2.1.3                             
[121] brio_1.1.3                               cli_3.6.0                                R.methodsS3_1.8.2                        parallel_4.2.2                          
[125] igraph_1.3.5                             pkgconfig_2.0.3                          TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2  GenomicAlignments_1.34.0                
[129] xml2_1.3.5                               foreach_1.5.2                            ggtree_3.6.2                             ggcorrplot_0.1.4                        
[133] XVector_0.38.0                           yulab.utils_0.0.6                        stringr_1.5.0                            digest_0.6.33                           
[137] Biostrings_2.66.0                        fastmatch_1.1-3                          tidytree_0.4.2                           restfulr_0.0.15                         
[141] curl_5.0.2                               gtools_3.9.4                             Rsamtools_2.14.0                         rjson_0.2.21                            
[145] jsonlite_1.8.7                           nlme_3.1-162                             lifecycle_1.0.4                          desc_1.4.2                              
[149] ggeasy_0.1.3                             viridisLite_0.4.1                        BSgenome_1.66.2                          fansi_1.0.4                             
[153] pillar_1.8.1                             lattice_0.20-45                          plotrix_3.8-2                            KEGGREST_1.38.0                         
[157] fastmap_1.1.1                            httr_1.4.7                               GO.db_3.16.0                             glue_1.6.2                              
[161] png_0.1-8                                iterators_1.0.14                         cmdfun_1.0.2                             bit_4.0.5                               
[165] ggforce_0.4.1                            stringi_1.7.12                           blob_1.2.3                               caTools_1.18.2                          
[169] memoise_2.0.1                            renv_0.16.0                              dplyr_1.0.10                             ape_5.6-2                               
snystrom commented 2 months ago

Hi @cparsania, could you tell me the version of DREME you are running as well as how you installed the MEME Suite?

cparsania commented 2 months ago

Hi,

MEME suite was installed using conda using this command conda install bioconda::meme

The version is 5.5.2

snystrom commented 2 months ago

Hi @cparsania the conda version of the meme suite tools is not supported by the MEME Suite authors. And in fact, the tool has several known bugs. One of the issues is it is packaged incorrectly and does not include template files required to generate outputs from the tool such as html. I believe this is the root cause of your issue (see also: https://github.com/snystrom/memes/issues/98).

I suggest installing from source as detailed here: https://meme-suite.org/meme/doc/install.html#quick_src

cparsania commented 2 months ago

Thanks. I compiled from the source now. Version 5.5.5.

Still the error is same

image

snystrom commented 2 months ago

Interesting. If you rerun and set outdir = "dreme_test/" (or some other directory on your filesystem), can you go ls dreme_test/ at the commandline and tell me the outputs there? I wonder whether these files are no longer generated by the tool or something else is happening. I'll try to give this a shot soon also, but like I maybe mentioned before, I unfortunately have less time to work on this software than I'd like.

Thanks for the quick reply, hope we can figure it out!

HDash commented 2 days ago

Hi @cparsania, I figured out the root of the problem. MEME suite requires certain Perl dependencies to function. In this case, one or more of them required to produce the HTML file output is not installed on your system.

Follow the guide here to fix it -> https://meme-suite.org/meme/doc/install.html#prereq_perl

@snystrom It might be good to include this point in the package's installation documentation. Even though it is unrelated to your code, it may help save future users from pointless debugging.

snystrom commented 2 days ago

@HDash thanks for the note. I can add this to the FAQ, however, I'm not convinced this solves it in all cases. For example, the conda install I think is looking for these modules within the conda env and not finding them. That I don't think will be solved with a system-wide install.

If @cparsania chimes in that installing the system deps fixes it, that's another story!

HDash commented 2 days ago

@HDash thanks for the note. I can add this to the FAQ, however, I'm not convinced this solves it in all cases. For example, the conda install I think is looking for these modules within the conda env and not finding them. That I don't think will be solved with a system-wide install.

If @cparsania chimes in that installing the system deps fixes it, that's another story!

Additional details: I am unsure how conda handles Perl dependencies, but I can confirm that cpan install XML::Parser solves this particular HTML generation issue. Without this module, running MEME suite commands directly results in the following error:

Warning: streme_xml_to_html exited abnormally and may have failed to create HTML output.

Interestingly, the error message when running the same function through the R package memes is less specific (error in original issue post). Despite the different error messages, the root cause is the same.

I have confirmed this to be the solution when running locally and inside a Docker container. It solves the issue for both using the bin directly and through the package.