mptrsen / Orthograph

Orthology prediction using a graph-based, reciprocal approach with profile hidden Markov models
GNU General Public License v3.0
32 stars 5 forks source link

Resuming orthograph runs? #31

Closed jsoghigian closed 4 years ago

jsoghigian commented 5 years ago

Hi Malte, great tool!

Is it possible to resume Orthograph runs? I have had a few Orthograph runs timeout on a cluster that are nearly completed with homology searches. Would be good to be able to resume these. I tried setting the clear-database parameter to 0 hoping this would cause Orthograph to pick up where it left off, but it threw this error at me instead: In standard out: ...... Storing translated sequences to database 'output/6Ae10RG/6Ae10RG.sqlite'... Transaction took 113.0 seconds.

In standard error: Usage: get_number_of_ests_for_specid(SPECID) at /home/jssoghig/apps/Orthograph-0.6.3/orthograph-analyzer line 1447.

mptrsen commented 5 years ago

Hi,

the most time-intensive part of Orthograph is running the HMM and BLAST searches. These are only re-run if either the output files do not exist or if they are empty (no content at all, indicating a canceled run). So you could simply restart the analysis and it should pick up where it left off, without needing much additional time. Please try this.

To be honest, clear-database = 0 has not been tested for a long time since nobody has ever needed it (to my recollection). I think there is also a warning that it might lead to database corruption. I should take a look at it anyway, though. Thanks for reporting!

jsoghigian commented 5 years ago

I've tried to re-run the analysis with the same output directories etc, and Orthograph backs up and removes the incomplete output of previous runs. This is without using changing clear-database from the default.

This is the beginning of a log from a run that had failed to finish (at ~84%) and I restarted:

OK: 'fastatranslate' version 2.2 OK: 'exonerate' version 2.2.0 OK: 'mafft-linsi' version 7.427 OK: 'hmmbuild' version 3.1 OK: 'hmmsearch' version 3.1 OK: 'makeblastdb' version 2.2.28 OK: 'blastp' version 2.2.28 Orthograph: Orthology prediction using a Graph-based, Reciprocal Approach with Profile Hidden Markov models Copyright 2015 Malte Petersen mptrsen@uni-bonn.de Version 0.6.3

Using output dir 'output/SAf2.spades'. Using tempdir 'output/SAf2.spades/tmp'. Using log dir 'output/SAf2.spades/log'. Using log file 'output/SAf2.spades/log/orthograph-analyzer-2019-07-03_15:12.log'. Alignment dir 'sets/mosquito_set1/aln' exists. HMM dir 'sets/mosquito_set1/hmms' exists. BLAST database dir 'sets/mosquito_set1/blast' exists. EST file input/SAf2.spades.fasta exists. HMMsearch output dir 'output/SAf2.spades/hmmsearch' exists. Reverse search output dir 'output/SAf2.spades/blast' exists. AA output dir 'output/SAf2.spades/aa' exists. NT output dir 'output/SAf2.spades/nt' exists. Backing up old output files... Old output files backed up in 'output/SAf2.spades/backup-2019-07-03_15:12.tar.bz2' (85337 files). Generating ortholog set mosquito_set1... This may take a long time, please be patient.

...SNIP TO SAVE SPACE... Database loaded here...

BLAST DB for set mosquito_set1 exists in 'sets/mosquito_set1/blast/mosquito_set1'. Using temporary directory 'output/SAf2.spades/tmp'. Using HMM dir 'sets/mosquito_set1/hmms' with 13150 HMM files. HMMsearch e-Value cutoff: 1e-05. Score cutoff: 10. Translating input/SAf2.spades.fasta in all six reading frames... output/SAf2.spades/SAf2.spades_prot.fasta exists, using this one. Clearing database of previous results from 'SAf2.spades'...

It then proceeds to re-run the searches a few lines later... Not a big deal since I can just run the few samples this happens to in a different queue with a much longer max wall time, but was just curious if there might be a way to salvage what was already done.

Thanks again for Orthograph!