stxmjh / interproscan

Automatically exported from code.google.com/p/interproscan
0 stars 0 forks source link

[interhelp #26350] problems running interproscan in clustermode on SGE cluster #50

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. interproscan.sh -dp -f TSV -goterms -iprlookup -pa -i test_proteins.fasta 
-mode cluster -clusterrunid testIPR

What is the expected output? What do you see instead?
I don't get any interpro results back. All I get in the .out and .err file of 
the submitted "i5t1master" jobs are message about connection problems. (see 
files attached, replaced servername by "submission_hostname").

What version of the product are you using? On what operating system?
interproscan5-7-48
linux RedHat Enterprise Version

Please provide any additional information below.
As far as I can see, the interproscan.sh script submits 2 jobs to our cluster 
(i5t1master), but apart from the messages in the attached files they don't do 
anything. After a while they stop/die and the main interproscan.sh script hangs 
forever.

Many thanks in advance,
lieven

Original issue reported on code.google.com by lieven.s...@gmail.com on 28 Aug 2014 at 4:18

Attachments:

GoogleCodeExporter commented 9 years ago

Original comment by Maxim.Sc...@gmail.com on 1 Sep 2014 at 10:02

GoogleCodeExporter commented 9 years ago
Dear Lieven,
Thanks for getting in touch with us.

As far as I can say the problem, which occurs on your side is not a general 
InterProScan (I5) problem. It must be related to the way your SGE cluster is 
set up or the way you run I5.

Based on our experience with other users I can give you some guidance and then 
we will see if it works: 

Does the main InterProsCan process also run behind the firewall? What 
messages do you get from the main InterProScan process? InterProScan 
uses TCP to communicate between processes (in cluster mode), and if all 
InterProscan processes are running behind the firewall I think there 
should be no problem. But we have not tested InterProScan in such a 
setup so it would be useful to know more what setup you have.

What sometimes also causes such problems is, if the submission hostname isn't 
fully qualified (without an absolute domain name).

Do you also have TCP ports that should not be used for messaging?

Kind Regards,
Maxim

Original comment by Maxim.Sc...@gmail.com on 1 Sep 2014 at 10:46

GoogleCodeExporter commented 9 years ago
Dear Maxim,

thanks a lot for your swift reply.
Yes, I already feared something like that, seeing posts from other people who 
indeed have I5 running on their SGE cluster.

To answer to questions:
* No, there is no firewall involved. All servers (computation nodes and submit 
host) are in the same subnet.
* Here is the output I see appearing from the main interproscan:
interproscan.sh -dp -f TSV -goterms -iprlookup -pa -i test_proteins.fasta -mode 
cluster -clusterrunid testIPR
Picked up JAVA_TOOL_OPTIONS: -XX:ParallelGCThreads=1
02/09/2014 14:24:15:137 Welcome to InterProScan-5.7-48.0
The Project/Cluster Run ID for this run is: testIPR
02/09/2014 14:24:38:394 Running InterProScan v5 in CLUSTER mode...
Loading file test_proteins.fasta
02/09/2014 14:24:56:143 Running the following analyses:
[jobTIGRFAM-13.0,jobProDom-2006.1,jobSMART-6.2,jobHAMAP-201311.27,jobSignalP-EUK
-4.0,jobPrositePatterns-20.97,jobPRINTS-42.0,jobSuperFamily-1.75,jobPanther-9.0,
jobGene3d-3.5.0,jobSignalP-GRAM_POSITIVE-4.0,jobPIRSF-2.84,jobSignalP-GRAM_NEGAT
IVE-4.0,jobPfamA-27.0,jobPrositeProfiles-20.97,jobPhobius-1.01,jobTMHMM-2.0c,job
Coils-2.2]
Pre-calculated match lookup service DISABLED.  Please wait for match 
calculations to complete...

After this nothing appears anymore and the main process just hangs...
I do get the error and output as described earlier and I also see in the tmp 
folder a series of folders (one for each analysis) with in each a fasta file 
but nothing else, except for the "jobLoadFromFasta" folder which is empty.

* The cluster nodes and the submit node resolve both the FQDN's and the 
hostnames.
* No, there are no limitations on the TCP ports

Our system is set up as follows:
* normal basic SGE setup of our cluster with a single submission host and 
several computation nodes (no submission abilities)
* cluster jobs are 'limited' in memory and #cores to use (jobs are submitted 
with a specified memory request, h_vmem, and number of cores to use, -pe serial 
#)
* software is loaded using an environment module system
* basic oracle java installation

Does the interpro software requires some additional software to be able to run 
in clustermode, eg. some sort of MPI software?

I've been playing around with the properties file and if I trick him with a 
'fake' qsub command (so that I5 thinks he's submitting a job) the analysis runs 
OK but of course than he's running it locally on the submission host and 
although that one can submit jobs to the cluster I don't see any "worker jobs" 
appearing on our cluster (even with "max.tier.depth=2")?

many thanks in advance for trying to get this resolved.
best,
lieven

Original comment by lieven.s...@gmail.com on 2 Sep 2014 at 12:57

GoogleCodeExporter commented 9 years ago
one small addition on my previous post about tricking I5 to run locally in 
clustermode. Here is the full output of such a trial run:

Picked up JAVA_TOOL_OPTIONS: -XX:ParallelGCThreads=1
02/09/2014 14:48:04:333 Welcome to InterProScan-5.7-48.0
The Project/Cluster Run ID for this run is: testIPR
02/09/2014 14:48:28:943 Running InterProScan v5 in CLUSTER mode...
Loading file /software/shared/apps/x86_64/iprscan/5.7-48/test_proteins.fasta
02/09/2014 14:48:41:149 Running the following analyses:
[jobTIGRFAM-13.0,jobProDom-2006.1,jobSMART-6.2,jobHAMAP-201311.27,jobSignalP-EUK
-4.0,jobPrositePatterns-20.97,jobPRINTS-42.0,jobSuperFamily-1.75,jobPanther-9.0,
jobGene3d-3.5.0,jobSignalP-GRAM_POSITIVE-4.0,jobPIRSF-2.84,jobSignalP-GRAM_NEGAT
IVE-4.0,jobPfamA-27.0,jobPrositeProfiles-20.97,jobPhobius-1.01,jobTMHMM-2.0c,job
Coils-2.2]
Pre-calculated match lookup service DISABLED.  Please wait for match 
calculations to complete...
02/09/2014 14:49:38:376 26% completed
02/09/2014 14:51:11:511 51% completed
02/09/2014 14:53:47:286 76% completed
02/09/2014 14:54:47:214 90% completed
2014-09-02 14:57:38,092 
[uk.ac.ebi.interpro.scan.management.model.implementations.WriteOutputStep:248] 
WARN - At run completion, unable to delete temporary directory 
temp/<hostname>_20140902_144840696_ehzt/jobPhobius-1.01
2014-09-02 14:57:38,099 
[uk.ac.ebi.interpro.scan.management.model.implementations.WriteOutputStep:248] 
WARN - At run completion, unable to delete temporary directory 
temp/<hostname>_20140902_144840696_ehzt/jobTMHMM-2.0c
2014-09-02 14:57:38,104 
[uk.ac.ebi.interpro.scan.management.model.implementations.WriteOutputStep:253] 
WARN - At run completion, unable to delete temporary directory 
temp/<hostname>_20140902_144840696_ehzt
02/09/2014 14:57:40:017 100% of analyses done:  InterProScan analyses completed

So he only complains about not being able to delete some folders in the 
clean-up which is a minor issue I guess. The rest seems fine and the output if 
I5 is present and complete.

lieven

Original comment by lieven.s...@gmail.com on 2 Sep 2014 at 1:09

GoogleCodeExporter commented 9 years ago
Hi, 

I am having some issues with InterProScan-5.7-48.0. 
When i run directly on server it works, but when run it in sbatch it gives me 
following error. 

16/10/2014 19:08:36:250 Welcome to InterProScan-5.7-48.0
16/10/2014 19:08:54:179 Running InterProScan v5 in STANDALONE mode...
Loading file MLO_plasmid_glimmer.fasta.TranslatedProtein.fasta
16/10/2014 19:09:04:614 Running the following analyses:
[jobPIRSF-2.84]
Available matches will be retrieved from the pre-calculated match lookup 
service.

Matches for any sequences that are not represented in the lookup service will 
be calculated locally.
16/10/2014 19:13:02:774 26% completed
2014-10-16 19:13:07,976 
[uk.ac.ebi.interpro.scan.jms.worker.LocalJobQueueListener:193] ERROR - 
Execution thrown when attempting to executeInTransaction the StepExecution.  
All database activity rolled back.
java.lang.IllegalStateException: IOException thrown when attempting to run 
binary
    at uk.ac.ebi.interpro.scan.management.model.implementations.RunBinaryStep.execute(RunBinaryStep.java:130)
    at uk.ac.ebi.interpro.scan.jms.activemq.StepExecutionTransactionImpl.executeInTransaction(StepExecutionTransactionImpl.java:86)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:319)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
    at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:110)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
    at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:202)
    at com.sun.proxy.$Proxy96.executeInTransaction(Unknown Source)
    at uk.ac.ebi.interpro.scan.jms.worker.LocalJobQueueListener.onMessage(LocalJobQueueListener.java:181)
    at org.springframework.jms.listener.AbstractMessageListenerContainer.doInvokeListener(AbstractMessageListenerContainer.java:562)
    at org.springframework.jms.listener.AbstractMessageListenerContainer.invokeListener(AbstractMessageListenerContainer.java:500)
    at org.springframework.jms.listener.AbstractMessageListenerContainer.doExecuteListener(AbstractMessageListenerContainer.java:468)
    at org.springframework.jms.listener.AbstractPollingMessageListenerContainer.doReceiveAndExecute(AbstractPollingMessageListenerContainer.java:326)
    at org.springframework.jms.listener.AbstractPollingMessageListenerContainer.receiveAndExecute(AbstractPollingMessageListenerContainer.java:264)
    at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.invokeListener(DefaultMessageListenerContainer.java:1071)
    at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.executeOngoingLoop(DefaultMessageListenerContainer.java:1063)
    at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.run(DefaultMessageListenerContainer.java:960)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Cannot run program "/bin/blast/2.2.6/blastall": 
error=2, No such file or directory
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:1047)
    at uk.ac.ebi.interpro.scan.io.cli.CommandLineConversationImpl.runCommand(CommandLineConversationImpl.java:144)
    at uk.ac.ebi.interpro.scan.management.model.implementations.RunBinaryStep.execute(RunBinaryStep.java:128)
    ... 22 more
Caused by: java.io.IOException: error=2, No such file or directory
    at java.lang.UNIXProcess.forkAndExec(Native Method)
    at java.lang.UNIXProcess.<init>(UNIXProcess.java:186)
    at java.lang.ProcessImpl.start(ProcessImpl.java:130)
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:1028)
    ... 24 more
2014-10-16 19:13:08,137 
[uk.ac.ebi.interpro.scan.jms.master.StandaloneBlackBoxMaster:56] WARN - 
StepInstance 9 is being re-run following a failure.
2014-10-16 19:13:13,202 
[uk.ac.ebi.interpro.scan.jms.worker.LocalJobQueueListener:193] ERROR - 
Execution thrown when attempting to executeInTransaction the StepExecution.  
All database activity rolled back.
java.lang.IllegalStateException: IOException thrown when attempting to run 
binary
    at uk.ac.ebi.interpro.scan.management.model.implementations.RunBinaryStep.execute(RunBinaryStep.java:130)
    at uk.ac.ebi.interpro.scan.jms.activemq.StepExecutionTransactionImpl.executeInTransaction(StepExecutionTransactionImpl.java:86)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:319)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
    at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:110)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
    at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:202)
    at com.sun.proxy.$Proxy96.executeInTransaction(Unknown Source)
    at uk.ac.ebi.interpro.scan.jms.worker.LocalJobQueueListener.onMessage(LocalJobQueueListener.java:181)
    at org.springframework.jms.listener.AbstractMessageListenerContainer.doInvokeListener(AbstractMessageListenerContainer.java:562)
    at org.springframework.jms.listener.AbstractMessageListenerContainer.invokeListener(AbstractMessageListenerContainer.java:500)
    at org.springframework.jms.listener.AbstractMessageListenerContainer.doExecuteListener(AbstractMessageListenerContainer.java:468)
    at org.springframework.jms.listener.AbstractPollingMessageListenerContainer.doReceiveAndExecute(AbstractPollingMessageListenerContainer.java:326)
    at org.springframework.jms.listener.AbstractPollingMessageListenerContainer.receiveAndExecute(AbstractPollingMessageListenerContainer.java:264)
    at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.invokeListener(DefaultMessageListenerContainer.java:1071)
    at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.executeOngoingLoop(DefaultMessageListenerContainer.java:1063)
    at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.run(DefaultMessageListenerContainer.java:960)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Cannot run program "/bin/blast/2.2.6/blastall": 
error=2, No such file or directory
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:1047)
    at uk.ac.ebi.interpro.scan.io.cli.CommandLineConversationImpl.runCommand(CommandLineConversationImpl.java:144)
    at uk.ac.ebi.interpro.scan.management.model.implementations.RunBinaryStep.execute(RunBinaryStep.java:128)
    ... 22 more
Caused by: java.io.IOException: error=2, No such file or directory
    at java.lang.UNIXProcess.forkAndExec(Native Method)
    at java.lang.UNIXProcess.<init>(UNIXProcess.java:186)
    at java.lang.ProcessImpl.start(ProcessImpl.java:130)
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:1028)
    ... 24 more
2014-10-16 19:13:13,287 
[uk.ac.ebi.interpro.scan.jms.master.StandaloneBlackBoxMaster:56] WARN - 
StepInstance 9 is being re-run following a failure.
2014-10-16 19:13:18,339 
[uk.ac.ebi.interpro.scan.jms.worker.LocalJobQueueListener:193] ERROR - 
Execution thrown when attempting to executeInTransaction the StepExecution.  
All database activity rolled back.
java.lang.IllegalStateException: IOException thrown when attempting to run 
binary
    at uk.ac.ebi.interpro.scan.management.model.implementations.RunBinaryStep.execute(RunBinaryStep.java:130)
    at uk.ac.ebi.interpro.scan.jms.activemq.StepExecutionTransactionImpl.executeInTransaction(StepExecutionTransactionImpl.java:86)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:319)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
    at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:110)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
    at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:202)
    at com.sun.proxy.$Proxy96.executeInTransaction(Unknown Source)
    at uk.ac.ebi.interpro.scan.jms.worker.LocalJobQueueListener.onMessage(LocalJobQueueListener.java:181)
    at org.springframework.jms.listener.AbstractMessageListenerContainer.doInvokeListener(AbstractMessageListenerContainer.java:562)
    at org.springframework.jms.listener.AbstractMessageListenerContainer.invokeListener(AbstractMessageListenerContainer.java:500)
    at org.springframework.jms.listener.AbstractMessageListenerContainer.doExecuteListener(AbstractMessageListenerContainer.java:468)
    at org.springframework.jms.listener.AbstractPollingMessageListenerContainer.doReceiveAndExecute(AbstractPollingMessageListenerContainer.java:326)
    at org.springframework.jms.listener.AbstractPollingMessageListenerContainer.receiveAndExecute(AbstractPollingMessageListenerContainer.java:264)
    at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.invokeListener(DefaultMessageListenerContainer.java:1071)
    at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.executeOngoingLoop(DefaultMessageListenerContainer.java:1063)
    at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.run(DefaultMessageListenerContainer.java:960)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Cannot run program "/bin/blast/2.2.6/blastall": 
error=2, No such file or directory
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:1047)
    at uk.ac.ebi.interpro.scan.io.cli.CommandLineConversationImpl.runCommand(CommandLineConversationImpl.java:144)
    at uk.ac.ebi.interpro.scan.management.model.implementations.RunBinaryStep.execute(RunBinaryStep.java:128)
    ... 22 more
Caused by: java.io.IOException: error=2, No such file or directory
    at java.lang.UNIXProcess.forkAndExec(Native Method)
    at java.lang.UNIXProcess.<init>(UNIXProcess.java:186)
    at java.lang.ProcessImpl.start(ProcessImpl.java:130)
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:1028)
    ... 24 more
2014-10-16 19:13:18,430 
[uk.ac.ebi.interpro.scan.jms.activemq.NonZeroExitOnUnrecoverableError:24] FATAL 
- Analysis step 9 : Run BLAST binary for PIRSF for proteins 101 to 117 has 
failed irretrievably.  Available StackTraces follow.
2014-10-16 19:13:18,430 
[uk.ac.ebi.interpro.scan.jms.activemq.NonZeroExitOnUnrecoverableError:41] FATAL 
- The JVM will now exit with a non-zero exit status.
2014-10-16 19:13:18,430 
[uk.ac.ebi.interpro.scan.jms.master.StandaloneBlackBoxMaster:105] ERROR - 
Exception thrown by StandaloneBlackBoxMaster: 
java.lang.IllegalStateException: InterProScan exiting with non-zero status, see 
logs for further information.
    at uk.ac.ebi.interpro.scan.jms.activemq.NonZeroExitOnUnrecoverableError.failed(NonZeroExitOnUnrecoverableError.java:42)
    at uk.ac.ebi.interpro.scan.jms.master.StandaloneBlackBoxMaster.run(StandaloneBlackBoxMaster.java:47)
    at uk.ac.ebi.interpro.scan.jms.main.Run.main(Run.java:270)
Finished execution

Original comment by shreePth...@gmail.com on 17 Oct 2014 at 4:22