ropensci / phylotaR

An automated pipeline for retrieving orthologous DNA sequences from GenBank in R
https://docs.ropensci.org/phylotaR
Other
23 stars 8 forks source link

makeblastdb error #30

Closed DomBennett closed 6 years ago

DomBennett commented 6 years ago

Reviewer could not run pipeline due to an error with makeblastdb.

> # RUN PIPELINE
> txid <- 9504
> setUp(wd="/Users/naupaka/Desktop/phylota_review/aotus", txid=txid, ncbi_dr=ncbi_dr, v=TRUE)
-----------------------------------------------
phylotaR: Implementation of PhyLoTa in R [v0.1]
-----------------------------------------------
Checking for valid NCBI BLAST+ Tools ...
Found: [/usr/local/bin/makeblastdb]
Found: [/usr/local/bin/blastn]
Setting up pipeline with the following parameters:
. blstn      [/usr/local/bin/blastn]
. btchsz     [300]
. date       [2018-03-26]
. mdlthrs    [3000]
. mkblstdb   [/usr/local/bin/makeblastdb]
. mncvrg     [51]
. mnsql      [250]
. mxevl      [1e-10]
. mxnds      [1e+05]
. mxrtry     [100]
. mxsql      [2000]
. mxsqs      [50000]
. ncps       [1]
. txid       [9504]
. v          [TRUE]
. wd         [/Users/naupaka/Desktop/phylota_review/aotus]
-----------------------------------------------
> run(wd=wd)
... Taxise
... Download
... Cluster
Error in runStgs(wd = wd, frm = 1, to = nstages, stgs_msg = stgs_msg) :
  Unexpected Error in error(ps = ps, paste0("makeblastdb failed to run. Check BLAST log files.")) :
  Error: makeblastdb failed to run. Check BLAST log files.

Occurred [2018-03-26 06:37:35]
Contact package maintainer for help.
DomBennett commented 6 years ago

Hi @naupaka!

Thanks for spending so much time on reviewing the phylotaR code. I'm trying to work out why you were getting the makeblstdb error. Would it be possible for you to send me the BLAST log files? They should be in [wd]/blast/[unique-name]-db.log.

Thanks! Dom

naupaka commented 6 years ago

@DomBennett my suspicion is that is has something to do with the wd path not being parsed properly somewhere along the line.

Here's one log:

BLAST options error: File ~/Desktop/phylota_review/capnodiales/blast/taxon-1047167-typ-subtree-db.fa does not exist
w/capnodiales/blast/taxon-1047167-typ-subtree-db.fa
New DB title:  ~/Desktop/phylota_review/capnodiales/blast/taxon-1047167-typ-subtree-db.fa
Sequence type: Nucleotide
Keep MBits: T
Maximum file size: 1000000000B

and another:

BLAST options error: File ~/Desktop/phylota_review/fungi/blast/taxon-41254-typ-subtree-db.fa does not exist
ta_review/fungi/blast/taxon-41254-typ-subtree-db.fa
New DB title:  ~/Desktop/phylota_review/fungi/blast/taxon-41254-typ-subtree-db.fa
Sequence type: Nucleotide
Keep MBits: T
Maximum file size: 1000000000B

There doesn't seem to be a BLAST log for the last example ("aotus").

naupaka commented 6 years ago

Huh. Now it seems to work fine. Not sure what changed.

> library(phylotaR)
> txid <- 9504
> setUp(wd="/Users/naupaka/Desktop/phylota_review/aotus", txid=txid, ncbi_dr=ncbi_dr, v=TRUE)
-----------------------------------------------
phylotaR: Implementation of PhyLoTa in R [v0.1]
-----------------------------------------------
Checking for valid NCBI BLAST+ Tools ...
Error in file.path(d, "makeblastdb") : object 'ncbi_dr' not found
> ncbi_dr <- "/usr/local/bin"
> setUp(wd="/Users/naupaka/Desktop/phylota_review/aotus", txid=txid, ncbi_dr=ncbi_dr, v=TRUE)
-----------------------------------------------
phylotaR: Implementation of PhyLoTa in R [v0.1]
-----------------------------------------------
Checking for valid NCBI BLAST+ Tools ...
Found: [/usr/local/bin/makeblastdb]
Found: [/usr/local/bin/blastn]
Setting up pipeline with the following parameters:
. blstn      [/usr/local/bin/blastn]
. btchsz     [300]
. date       [2018-04-04]
. mdlthrs    [3000]
. mkblstdb   [/usr/local/bin/makeblastdb]
. mncvrg     [51]
. mnsql      [250]
. mxevl      [1e-10]
. mxnds      [1e+05]
. mxrtry     [100]
. mxsql      [2000]
. mxsqs      [50000]
. ncps       [1]
. txid       [9504]
. v          [TRUE]
. wd         [/Users/naupaka/Desktop/phylota_review/aotus]
Error in setUpCch(ps = ps) : Cache already exists, ovrwrt=FALSE.
> restart(wd="/Users/naupaka/Desktop/phylota_review/aotus")
------------------------------------------------------
Restarting pipeline on [unix] at [2018-04-04 09:54:00]
------------------------------------------------------
Running stages: taxise, download, cluster, cluster2
--------------------------------------------
Starting stage TAXISE: [2018-04-04 09:54:00]
--------------------------------------------
Searching taxonomic IDs ...
Downloading taxonomic records ...
. [1-21]
Generating taxonomic dictionary ...
---------------------------------------------
Completed stage TAXISE: [2018-04-04 09:54:03]
---------------------------------------------
----------------------------------------------
Starting stage DOWNLOAD: [2018-04-04 09:54:03]
----------------------------------------------
Identifying suitable clades ...
Identified [1] suitable clades.
Downloading hierarchically ...
Working on parent [id 9504]: [1/1] ...
. + whole subtree ...
. . Getting [2805 sqs] ...
. . . [1-300]
. . . [301-600]
. . . [601-900]
. . . [901-1200]
. . . [1201-1500]
. . . [1501-1800]
. . . [1801-2100]
. . . [2101-2400]
. . . [2401-2700]
. . . [2701-2805]
Successfully downloaded [2980 sqs] in total.
-----------------------------------------------
Completed stage DOWNLOAD: [2018-04-04 09:55:48]
-----------------------------------------------
---------------------------------------------
Starting stage CLUSTER: [2018-04-04 09:55:48]
---------------------------------------------
Working on [id 9504]
. Generating subtree clusters for [id 9504(genus)]
. Generating direct clusters for [id 9504(genus)]
. . [0 sqs]
. . . Too few sequences, cannot make clusters
. BLASTing [2980 sqs] ....
. . Running makeblastdb
. . Running blastn
. . Removed [17762/90442] BLAST hits due to insufficient coverage
. Identified [837] clusters
. Processing [id 9504] child [id 1263727]
. . Generating subtree clusters for [id 1263727(species)]
. . . . [0 sqs] -- too few sequences, cannot make clusters
. Processing [id 9504] child [id 1230482]
. . Generating subtree clusters for [id 1230482(species)]
. . . . [0 sqs] -- too few sequences, cannot make clusters
. Processing [id 9504] child [id 1090913]
. . Generating subtree clusters for [id 1090913(species)]
. . . . [0 sqs] -- too few sequences, cannot make clusters
. Processing [id 9504] child [id 1002694]
. . Generating subtree clusters for [id 1002694(species)]
. . . . [0 sqs] -- too few sequences, cannot make clusters
. Processing [id 9504] child [id 940829]
. . Generating subtree clusters for [id 940829(species)]
. . . . [0 sqs] -- too few sequences, cannot make clusters
. Processing [id 9504] child [id 413234]
. . Generating subtree clusters for [id 413234(species)]
. . . . [0 sqs] -- too few sequences, cannot make clusters
. Processing [id 9504] child [id 361674]
. . Generating subtree clusters for [id 361674(species)]
. . . . [0 sqs] -- too few sequences, cannot make clusters
. Processing [id 9504] child [id 292213]
. . Generating subtree clusters for [id 292213(species)]
. . . . [0 sqs] -- too few sequences, cannot make clusters
. Processing [id 9504] child [id 261316]
. . Generating subtree clusters for [id 261316(species)]
. . . . [0 sqs] -- too few sequences, cannot make clusters
. Processing [id 9504] child [id 231953]
. . Generating subtree clusters for [id 231953(species)]
. . . . [0 sqs] -- too few sequences, cannot make clusters
. Processing [id 9504] child [id 222417]
. . Generating subtree clusters for [id 222417(species)]
. . . . [0 sqs] -- too few sequences, cannot make clusters
. Processing [id 9504] child [id 57176]
. . Generating subtree clusters for [id 57176(species)]
. . . . [0 sqs] -- too few sequences, cannot make clusters
. Processing [id 9504] child [id 57175]
. . Generating subtree clusters for [id 57175(species)]
. . . . [0 sqs] -- too few sequences, cannot make clusters
. Processing [id 9504] child [id 43147]
. . Generating subtree clusters for [id 43147(species)]
. . . . [0 sqs] -- too few sequences, cannot make clusters
. Processing [id 9504] child [id 37293]
. . Generating subtree clusters for [id 37293(species)]
. . . . [0 sqs] -- too few sequences, cannot make clusters
. Processing [id 9504] child [id 30591]
. . Generating subtree clusters for [id 30591(species)]
. . Generating direct clusters for [id 30591(species)]
. . . [507 sqs]
. . BLASTing [507 sqs] ....
. . Removed [141/4130] BLAST hits due to insufficient coverage
. . Identified [253] clusters
. . BLASTing [244 sqs] ....
. . Removed [176/2300] BLAST hits due to insufficient coverage
. . Identified [85] clusters
. . Processing [id 30591] child [id 867331]
. . . Generating subtree clusters for [id 867331(subspecies)]
. . . . . [0 sqs] -- too few sequences, cannot make clusters
. . Processing [id 30591] child [id 280755]
. . . Generating subtree clusters for [id 280755(subspecies)]
. . . . . [0 sqs] -- too few sequences, cannot make clusters
. . Processing [id 30591] child [id 120088]
. . . Generating subtree clusters for [id 120088(subspecies)]
. . . . . [0 sqs] -- too few sequences, cannot make clusters
. Processing [id 9504] child [id 9505]
. . Generating subtree clusters for [id 9505(species)]
. . . . [0 sqs] -- too few sequences, cannot make clusters
[1/1]
----------------------------------------------
Completed stage CLUSTER: [2018-04-04 09:56:03]
----------------------------------------------
-----------------------------------------------
Starting stage CLUSTER^2: [2018-04-04 09:56:03]
-----------------------------------------------
Loading clusters ...
Done. Only one cluster set -- skipping cluster^2
Dropping all clusters of < 3 sqs ...
Renumbering clusters ...
Saving ...
------------------------------------------------
Completed stage CLUSTER^2: [2018-04-04 09:56:03]
------------------------------------------------
-------------------------------------------
Completed pipeline at [2018-04-04 09:56:03]
-------------------------------------------
> devtools::session_info()
Session info ---------------------------------------------------------------------------------------------------------------
 setting  value                       
 version  R version 3.4.4 (2018-03-15)
 system   x86_64, darwin17.3.0        
 ui       RStudio (1.1.383)           
 language (EN)                        
 collate  en_US.UTF-8                 
 tz       America/Los_Angeles         
 date     2018-04-04                  

Packages -------------------------------------------------------------------------------------------------------------------
 package     * version   date       source                              
 base        * 3.4.4     2018-03-15 local                               
 compiler      3.4.4     2018-03-15 local                               
 curl          3.1       2017-12-12 CRAN (R 3.4.3)                      
 datasets    * 3.4.4     2018-03-15 local                               
 devtools      1.13.5    2018-02-18 CRAN (R 3.4.3)                      
 digest        0.6.15    2018-01-28 CRAN (R 3.4.3)                      
 graphics    * 3.4.4     2018-03-15 local                               
 grDevices   * 3.4.4     2018-03-15 local                               
 httr          1.3.1     2017-08-20 CRAN (R 3.4.1)                      
 igraph        1.2.1     2018-03-10 CRAN (R 3.4.3)                      
 jsonlite      1.5       2017-06-01 CRAN (R 3.4.0)                      
 magrittr      1.5       2014-11-22 CRAN (R 3.4.0)                      
 memoise       1.1.0     2017-04-21 CRAN (R 3.4.0)                      
 methods     * 3.4.4     2018-03-15 local                               
 phylotaR    * 0.1       2018-03-26 Github (DomBennett/phylotaR@2541b03)
 pkgconfig     2.0.1     2017-03-21 CRAN (R 3.4.0)                      
 R.methodsS3   1.7.1     2016-02-16 CRAN (R 3.4.1)                      
 R.oo          1.21.0    2016-11-01 CRAN (R 3.4.1)                      
 R.utils       2.6.0     2017-11-05 CRAN (R 3.4.2)                      
 R6            2.2.2     2017-06-17 CRAN (R 3.4.0)                      
 rentrez       1.2.1     2018-03-05 CRAN (R 3.4.3)                      
 stats       * 3.4.4     2018-03-15 local                               
 sys           1.5       2017-10-10 cran (@1.5)                         
 tools         3.4.4     2018-03-15 local                               
 treeman       1.1.1     2017-06-27 cran (@1.1.1)                       
 utils       * 3.4.4     2018-03-15 local                               
 withr         2.1.2     2018-03-15 CRAN (R 3.4.3)                      
 XML           3.98-1.10 2018-02-19 CRAN (R 3.4.3)                      
 yaml          2.1.18    2018-03-08 CRAN (R 3.4.3)                      
> sessionInfo()
R version 3.4.4 (2018-03-15)
Platform: x86_64-apple-darwin17.3.0 (64-bit)
Running under: macOS High Sierra 10.13.3

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libLAPACK.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] phylotaR_0.1

loaded via a namespace (and not attached):
 [1] XML_3.98-1.10     treeman_1.1.1     withr_2.1.2       digest_0.6.15     R.methodsS3_1.7.1 R6_2.2.2         
 [7] sys_1.5           jsonlite_1.5      magrittr_1.5      httr_1.3.1        curl_3.1          rstudioapi_0.7   
[13] rentrez_1.2.1     R.oo_1.21.0       R.utils_2.6.0     devtools_1.13.5   tools_3.4.4       igraph_1.2.1     
[19] yaml_2.1.18       compiler_3.4.4    pkgconfig_2.0.1   memoise_1.1.0  
naupaka commented 6 years ago

The only other thing that's different is that this time I used the stable version of RStudio, whereas usually I use the daily build.

DomBennett commented 6 years ago

The only thing I was thinking is it might be to do with ~/ which isn't rendered properly when called outside R through the sys package. But it seemed as if you specifically tested for that in your review. At the very least, I should prevent users using the ~/.

DomBennett commented 6 years ago

Now preventing users run with ~ in filepaths.