Closed jsoghigian closed 4 years ago
Hi,
the most time-intensive part of Orthograph is running the HMM and BLAST searches. These are only re-run if either the output files do not exist or if they are empty (no content at all, indicating a canceled run). So you could simply restart the analysis and it should pick up where it left off, without needing much additional time. Please try this.
To be honest, clear-database = 0
has not been tested for a long time since nobody has ever needed it (to my recollection). I think there is also a warning that it might lead to database corruption. I should take a look at it anyway, though. Thanks for reporting!
I've tried to re-run the analysis with the same output directories etc, and Orthograph backs up and removes the incomplete output of previous runs. This is without using changing clear-database from the default.
This is the beginning of a log from a run that had failed to finish (at ~84%) and I restarted:
OK: 'fastatranslate' version 2.2 OK: 'exonerate' version 2.2.0 OK: 'mafft-linsi' version 7.427 OK: 'hmmbuild' version 3.1 OK: 'hmmsearch' version 3.1 OK: 'makeblastdb' version 2.2.28 OK: 'blastp' version 2.2.28 Orthograph: Orthology prediction using a Graph-based, Reciprocal Approach with Profile Hidden Markov models Copyright 2015 Malte Petersen mptrsen@uni-bonn.de Version 0.6.3
Using output dir 'output/SAf2.spades'. Using tempdir 'output/SAf2.spades/tmp'. Using log dir 'output/SAf2.spades/log'. Using log file 'output/SAf2.spades/log/orthograph-analyzer-2019-07-03_15:12.log'. Alignment dir 'sets/mosquito_set1/aln' exists. HMM dir 'sets/mosquito_set1/hmms' exists. BLAST database dir 'sets/mosquito_set1/blast' exists. EST file input/SAf2.spades.fasta exists. HMMsearch output dir 'output/SAf2.spades/hmmsearch' exists. Reverse search output dir 'output/SAf2.spades/blast' exists. AA output dir 'output/SAf2.spades/aa' exists. NT output dir 'output/SAf2.spades/nt' exists. Backing up old output files... Old output files backed up in 'output/SAf2.spades/backup-2019-07-03_15:12.tar.bz2' (85337 files). Generating ortholog set mosquito_set1... This may take a long time, please be patient.
...SNIP TO SAVE SPACE... Database loaded here...
BLAST DB for set mosquito_set1 exists in 'sets/mosquito_set1/blast/mosquito_set1'. Using temporary directory 'output/SAf2.spades/tmp'. Using HMM dir 'sets/mosquito_set1/hmms' with 13150 HMM files. HMMsearch e-Value cutoff: 1e-05. Score cutoff: 10. Translating input/SAf2.spades.fasta in all six reading frames... output/SAf2.spades/SAf2.spades_prot.fasta exists, using this one. Clearing database of previous results from 'SAf2.spades'...
It then proceeds to re-run the searches a few lines later... Not a big deal since I can just run the few samples this happens to in a different queue with a much longer max wall time, but was just curious if there might be a way to salvage what was already done.
Thanks again for Orthograph!
Hi Malte, great tool!
Is it possible to resume Orthograph runs? I have had a few Orthograph runs timeout on a cluster that are nearly completed with homology searches. Would be good to be able to resume these. I tried setting the clear-database parameter to 0 hoping this would cause Orthograph to pick up where it left off, but it threw this error at me instead: In standard out: ......
Storing translated sequences to database 'output/6Ae10RG/6Ae10RG.sqlite'...
Transaction took 113.0 seconds.
In standard error: Usage: get_number_of_ests_for_specid(SPECID) at /home/jssoghig/apps/Orthograph-0.6.3/orthograph-analyzer line 1447.