Open Paul-Donat opened 4 years ago
How long does the exonerate command take when you run it manually? Is it getting stuck on the same gene for many samples? The issue leading to the --timeout
workaround originally came up when folks were assembling using reference proteins that have very repetitive sequences, or even when the reference was not coding sequence at all. That would be my first thing to check.
I am having the same issue. I don't think it is a problem with my reference sequences because it only comes up in a subset of samples, not a subset of genes.
@mossmatters can you help with two things:
How can I run exonerate manually (what command should be given?)
How can I tell what gene it is getting stuck on?
I've been logging the stderr
from reads_first.py
. The second-to-last gene output looks like this:
ETA: 60s Left: 170 AVG: 0.37s local:1/19/100%/0.4s
ETA: 60s Left: 170 AVG: 0.37s local:1/19/100%/0.4s
ETA: 60s Left: 170 AVG: 0.37s local:1/19/100%/0.4s
ETA: 61s Left: 170 AVG: 0.37s local:1/19/100%/0.4s
ETA: 61s Left: 170 AVG: 0.37s local:1/19/100%/0.4s
ETA: 61s Left: 170 AVG: 0.37s local:1/19/100%/0.4s
ETA: 61s Left: 170 AVG: 0.37s local:1/19/100%/0.4s
ETA: 61s Left: 170 AVG: 0.37s local:1/19/100%/0.4s
ETA: 61s Left: 170 AVG: 0.37s local:1/19/100%/0.4s There were 2 exonerate hits for cluster3423/cluster3423_baits.fasta.
maol-cluster3423Writing amino acid sequence, length: 183
So we can see that the gene called cluster3423
worked correctly.
But the last gene starts like this:
ETA: 61s Left: 169 AVG: 0.35s local:1/20/100%/0.4s
ETA: 60s Left: 169 AVG: 0.35s local:1/20/100%/0.4s
ETA: 60s Left: 169 AVG: 0.35s local:1/20/100%/0.4s
ETA: 60s Left: 169 AVG: 0.35s local:1/20/100%/0.4s
... and continues on and on for thousands of lines:
ETA: 294148s Left: 169 AVG: 1740.95s local:1/20/100%/1741.0s
ETA: 294157s Left: 169 AVG: 1741.00s local:1/20/100%/1741.0s
ETA: 294165s Left: 169 AVG: 1741.05s local:1/20/100%/1741.1s
ETA
just keeps going up without ever finishing. Since it only prints the gene name when it finishes, it's not clear to me how to see what gene it is currently stuck on.
Hi Joel et al - We are currently working on a HybPiper 2.0 that addresses a lot of these issues. This same issue was coming up on testing and seems to be related to low-complexity DNA in the target file. If there is a repetitive region in the target, there are a lot of contigs with potential mappings and exonerate takes a long time.
In HybPiper 1.3 you can currently run --timeout 4000
in reads_first.py
to kill any spades or exonerate jobs that are taking a long time (i.e. 4000% the average time). Killing the exonerate process will allow HybPiper to finish and create all the output reports you need.
In the future we are also working on building filters for low-complexity DNA/protein sequences in the target files.
As for running exonerate_hits.py
manually, you just need to descend into the proper gene file and call the python script using the geneName_contigs.fasta
and geneName_baits.fasta
files.
You can find which gene is missing by looking at exonerate_genelist.txt
and comparing that to the genes that actually have sequences generated - the one with the hangup is likely to be missing. It is also likely to have a contigs.fasta
file that is very very big with lots of low-complexity DNA.
For multiple samples at the "last writing amino acid sequence" step, exonerate stops. It seems to be entering a waiting loop that it never exits from. the --timeout will stop exonerate but it doesn't resolve the issue.
I have manually been able to run the exonerate command that is stalling out.
running on a 48 core centos 7 server.
any insight into what might be happening would helpful.