phac-nml / irida

Canada’s Integrated Rapid Infectious Disease Analysis Platform for Genomic Epidemiology
https://irida.ca
Apache License 2.0
40 stars 31 forks source link

Uncaught exception in exposed API method - Galaxy workflow #328

Open thobalose opened 5 years ago

thobalose commented 5 years ago

snippy4

Describe the bug

I have created a pipeline plugin and deployed it to an IRIDA instance. When I execute the pipeline I keep hitting the following error:

26 Mar 2019 12:58:14,927 ERROR ca.corefacility.bioinformatics.irida.service.analysis.execution.AnalysisExecutionServiceAspect:63 - Error occured for submission: AnalysisSubmission [id=58, name=SnippyPipeline_20190326_SRR1167, submitter=thoba, workflowId=789a8d42-98d9-4c4f-9e88-444bb0e5bbfa, analysisState=SUBMITTING, analysisCleanedState=NOT_CLEANED] changing to state ERROR
ca.corefacility.bioinformatics.irida.exceptions.WorkflowException: GalaxyResponseException{status=500, responseBody={"err_msg": "Uncaught exception in exposed API method:", "err_code": 0}, errorMessage=Uncaught exception in exposed API method:, errorCode=0, traceback=null}
    at ca.corefacility.bioinformatics.irida.pipeline.upload.galaxy.GalaxyWorkflowService.runWorkflow(GalaxyWorkflowService.java:123)
    at ca.corefacility.bioinformatics.irida.service.analysis.execution.galaxy.AnalysisExecutionServiceGalaxyAsync.executeAnalysis(AnalysisExecutionServiceGalaxyAsync.java:151)
    at ca.corefacility.bioinformatics.irida.service.analysis.execution.galaxy.AnalysisExecutionServiceGalaxyAsync$$FastClassBySpringCGLIB$$fc2d896.invoke(<generated>)
    at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204)
    at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:718)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157)
    at org.springframework.aop.aspectj.MethodInvocationProceedingJoinPoint.proceed(MethodInvocationProceedingJoinPoint.java:85)
    at ca.corefacility.bioinformatics.irida.service.analysis.annotations.RunAsUserAspect.setSecurityContextFromAnalysisSubmission(RunAsUserAspect.java:100)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethodWithGivenArgs(AbstractAspectJAdvice.java:621)
    at org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethod(AbstractAspectJAdvice.java:610)
    at org.springframework.aop.aspectj.AspectJAroundAdvice.invoke(AspectJAroundAdvice.java:68)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:168)
    at org.springframework.aop.aspectj.AspectJAfterThrowingAdvice.invoke(AspectJAfterThrowingAdvice.java:58)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:168)
    at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:92)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)
    at org.springframework.aop.interceptor.AsyncExecutionInterceptor$1.call(AsyncExecutionInterceptor.java:108)
    at org.springframework.security.concurrent.DelegatingSecurityContextCallable.call(DelegatingSecurityContextCallable.java:60)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: GalaxyResponseException{status=500, responseBody={"err_msg": "Uncaught exception in exposed API method:", "err_code": 0}, errorMessage=Uncaught exception in exposed API method:, errorCode=0, traceback=null}
    at com.github.jmchilton.blend4j.galaxy.Client.buildResponseException(Client.java:18)
    at com.github.jmchilton.blend4j.BaseClient.checkResponse(BaseClient.java:138)
    at com.github.jmchilton.blend4j.BaseClient.create(BaseClient.java:47)
    at com.github.jmchilton.blend4j.BaseClient.create(BaseClient.java:36)
    at com.github.jmchilton.blend4j.BaseClient.create(BaseClient.java:32)
    at com.github.jmchilton.blend4j.galaxy.WorkflowsClientImpl.runWorkflowResponse(WorkflowsClientImpl.java:38)
    at com.github.jmchilton.blend4j.galaxy.WorkflowsClientImpl.runWorkflow(WorkflowsClientImpl.java:42)
    at ca.corefacility.bioinformatics.irida.pipeline.upload.galaxy.GalaxyWorkflowService.runWorkflow(GalaxyWorkflowService.java:121)
    ... 25 more

Steps to reproduce the problem

To build and upload https://github.com/COMBAT-TB/irida-pipeline-plugins/tree/master/snippy-pipeline-plugin

Expected behavior

I was expecting the workflow to run to completion.

Additional context

The job is submitted to Galaxy and runs steps 0 to 2.

thobalose commented 5 years ago

I though I'd link this issue https://github.com/galaxyproject/galaxy/issues/4184 as I see this error from Galaxy.

apetkau commented 5 years ago

Thanks for all the great information and a link to the pipeline code :smile:

I built the plugin to try and test it out. I ran into an issue with one of the dependencies, confil. Is this available anywhere? I could not find it in conda.

I removed this tool from the workflow and re-built the plugin and it worked in the version of IRIDA I have (the latest development branch).

Which version of Galaxy were you testing this in? I tested it out against Galaxy version 18.09 (the Docker version).

thobalose commented 5 years ago

@apetkau Thank you very much for the timely response.

WRT: confil; at the moment, you can install it using conda install -c thoba confil or by adding the channel https://conda.anaconda.org/thoba to galaxy.yml. confil depends on kraken2 which can be computationally expensive when testing locally.

I will also remove the tool and try re-build and test against Galaxy version 19.01.

thobalose commented 5 years ago

@apetkau Confirmed! Removing confil in the workflow works. We now need to figure out why.

Thank you.

apetkau commented 5 years ago

Awesome, thanks @thobalose. I got confil installed.

Testing this out a bit more, it's not actually confil that is causing this issue, it's the Filter empty tool in Galaxy. Removing this step and it will work (well, confil doesn't run to completion due to some kraken2 error I am getting, but you won't get the Galaxy API error in IRIDA).

The Galaxy issue you linked to suggests the Galaxy API error is due to using the older run API call (instead of invoke). We're using blend4j, which is a bit old, and I'm pretty sure is using the older API calls. This may be causing issues with some of the collection tools (such as Filter empty).

I'm still looking into this, but figured I'd give an update.

I also wanted to mention, for debugging interactions with Galaxy, there's a small change you can make to the code to print out all the Galaxy API calls in the logs. Specifically add true as an extra parameter to this line https://github.com/phac-nml/irida/blob/bbf081fb1f6d6570854fb024ba44f8560f9dbb6a/src/main/java/ca/corefacility/bioinformatics/irida/config/analysis/ExecutionManagerConfig.java#L334 to enable debug messages.

GalaxyInstanceFactory.get(executionManager().getLocation().toString(), executionManager().getAPIKey(), true);

This will then print out information like:

5 * Client out-bound request
5 > POST http://localhost:48888/api/workflows?key=admin
5 > Content-Type: application/json
5 > Accept: application/json
{"parameters":{"testtoolshed.g2.bx.psu.edu/repos/sanbi-uwc/confil/confil/0.1.5+galaxy0":{"cutoff":"90","input_type_conditional":{"input_type":"paired_collection"}},"toolshed.g2.bx.psu.edu/repos/iuc/snippy/snippy/4.3.6+galaxy1":{"fastq_input":{"fastq_input_selector":"paired_$
ollection"},"adv":{"minfrac":"0.9","mincov":"10","minqual":"100.0","mapqual":"60","rename_cons":"false"}}},"workflow_id":"911dde3ddb677bcd","history":"hist_id=5564089c81cf7fe8","ds_map":{"0":{"id":"56688e7db014eaac","src":"hda"},"1":{"id":"c86c1b73aa7102dd","src":"hdca"}},"$
o_add_to_history":"true"}

At some point I should really make this a config option. But, since you're already working with the code I figured I should mention it as it may help you out.

thobalose commented 5 years ago

@apetkau That is very interesting to hear wrt Filter empty tool and we need it filter output from confil. :crying_cat_face:

I also suspected blend4j as the cause of the API error.

Thank you for the heads up on debugging interactions with Galaxy. :1st_place_medal:

Will keep you posted.

apetkau commented 5 years ago

Temporary Solution

I was doing a bit more testing on this and I think I have a method for you to try out. This is a configuration option that can be set in Galaxy, force_beta_workflow_scheduled_for_collections, which when set to true should force the usage of the newer workflow scheduler in Galaxy.

Caveats

Now, this isn't a complete solution though. The reason I was continuing to use the older Galaxy API for running workflows is because it only returns when all jobs are completely scheduled. This means that after the workflow is scheduled I can query Galaxy to ask for the percent complete (e.g., 10 jobs complete out of 100 jobs), which is used to display the progress bar in IRIDA:

image

Using the newer Galaxy invoke method for scheduling workflows, I believe the API call to Galaxy returns before all jobs are scheduled, so it may cause the progress bar in IRIDA to not correspond to the progress of the workflow (and there could also be other weird issues, I haven't done too much testing on this). The proper solution for this would be to modify IRIDA to properly handle the newer Galaxy API call, but this will take a bit of time.

Additional Info

I did test other workflows using the Filter empty tool and they seemed to work in IRIDA. I believe that the reason your specific workflow is encountering this issue is because the tool confil is generating collections of fastq files as output depending on the user input options, which are passed to Filter empty. In the Galaxy log file I'm seeing:

galaxy.workflow.run DEBUG 2019-04-02 20:54:43,991 [p:28419,w:2,m:0] [uWSGIWorker2Core2] Marking step 97 outputs of invocation 19 delayed (tool [__FILTER_EMPTY_DATASETS__] inputs are not ready, this special tool requires inputs to be ready)

The Filter empty tool is getting it's input from confil and so Galaxy is saying in this line that it needs to wait for confil to finish before scheduling the rest of the jobs (I'm guessing because the specific elements on the output collection from confil are only determined at runtime). So, using the older Galaxy API (which should return only when all jobs are scheduled in Galaxy) is breaking in this specific case.

dfornika commented 4 years ago

I'm getting bitten by this issue on a pipeline that I'm working on. Can I help to fix it?

apetkau commented 4 years ago

Sure. Do you know what the corresponding message in the Galaxy logs is? Or do you have the workflow available to be tested?

dfornika commented 4 years ago

This is the pipeline I'm working on: https://github.com/dfornika/irida-plugin-plasmid-screen

I have a similar situation where I'm trying to use a collection tool (either 'Collapse Collections' or 'Extract Dataset')

I see a similar stacktrace in the galaxy logs:

galaxy.workflow.run DEBUG 2019-12-24 02:22:49,060 [p:688,w:1,m:0] [uWSGIWorker1Core0] Workflow step 683 of invocation 16 invoked (128.016 ms)
galaxy.workflow.run DEBUG 2019-12-24 02:22:49,068 [p:688,w:1,m:0] [uWSGIWorker1Core0] Marking step 684 outputs of invocation 16 delayed (dependent step [680] delayed, so this step must be de
layed)
galaxy.workflow.run DEBUG 2019-12-24 02:22:49,075 [p:688,w:1,m:0] [uWSGIWorker1Core0] Marking step 685 outputs of invocation 16 delayed (dependent step [681] delayed, so this step must be de
layed)
galaxy.workflow.run DEBUG 2019-12-24 02:22:49,080 [p:688,w:1,m:0] [uWSGIWorker1Core0] Marking step 686 outputs of invocation 16 delayed (dependent step [680] delayed, so this step must be de
layed)
galaxy.web.framework.decorators ERROR 2019-12-24 02:22:49,154 [p:688,w:1,m:0] [uWSGIWorker1Core0] Uncaught exception in exposed API method:
Traceback (most recent call last):
  File "lib/galaxy/web/framework/decorators.py", line 282, in decorator
    rval = func(self, trans, *args, **kwargs)
  File "lib/galaxy/webapps/galaxy/api/workflows.py", line 411, in create
    for v in outputs[step.id].values():
AttributeError: 'object' object has no attribute 'values'
172.17.0.1 - - [24/Dec/2019:02:22:47 +0000] "POST /api/workflows?key=admin HTTP/1.1" 500 - "-" "Java/11.0.2"