Open thobalose opened 5 years ago
I though I'd link this issue https://github.com/galaxyproject/galaxy/issues/4184 as I see this error from Galaxy.
Thanks for all the great information and a link to the pipeline code :smile:
I built the plugin to try and test it out. I ran into an issue with one of the dependencies, confil
. Is this available anywhere? I could not find it in conda.
I removed this tool from the workflow and re-built the plugin and it worked in the version of IRIDA I have (the latest development
branch).
Which version of Galaxy were you testing this in? I tested it out against Galaxy version 18.09 (the Docker version).
@apetkau Thank you very much for the timely response.
WRT: confil
; at the moment, you can install it using conda install -c thoba confil
or by adding the channel https://conda.anaconda.org/thoba
to galaxy.yml
. confil
depends on kraken2
which can be computationally expensive when testing locally.
I will also remove the tool and try re-build and test against Galaxy version 19.01.
@apetkau Confirmed! Removing confil
in the workflow works. We now need to figure out why.
Thank you.
Awesome, thanks @thobalose. I got confil
installed.
Testing this out a bit more, it's not actually confil
that is causing this issue, it's the Filter empty
tool in Galaxy. Removing this step and it will work (well, confil
doesn't run to completion due to some kraken2
error I am getting, but you won't get the Galaxy API error in IRIDA).
The Galaxy issue you linked to suggests the Galaxy API error is due to using the older run
API call (instead of invoke
). We're using blend4j, which is a bit old, and I'm pretty sure is using the older API calls. This may be causing issues with some of the collection tools (such as Filter empty).
I'm still looking into this, but figured I'd give an update.
I also wanted to mention, for debugging interactions with Galaxy, there's a small change you can make to the code to print out all the Galaxy API calls in the logs. Specifically add true
as an extra parameter to this line https://github.com/phac-nml/irida/blob/bbf081fb1f6d6570854fb024ba44f8560f9dbb6a/src/main/java/ca/corefacility/bioinformatics/irida/config/analysis/ExecutionManagerConfig.java#L334 to enable debug messages.
GalaxyInstanceFactory.get(executionManager().getLocation().toString(), executionManager().getAPIKey(), true);
This will then print out information like:
5 * Client out-bound request
5 > POST http://localhost:48888/api/workflows?key=admin
5 > Content-Type: application/json
5 > Accept: application/json
{"parameters":{"testtoolshed.g2.bx.psu.edu/repos/sanbi-uwc/confil/confil/0.1.5+galaxy0":{"cutoff":"90","input_type_conditional":{"input_type":"paired_collection"}},"toolshed.g2.bx.psu.edu/repos/iuc/snippy/snippy/4.3.6+galaxy1":{"fastq_input":{"fastq_input_selector":"paired_$
ollection"},"adv":{"minfrac":"0.9","mincov":"10","minqual":"100.0","mapqual":"60","rename_cons":"false"}}},"workflow_id":"911dde3ddb677bcd","history":"hist_id=5564089c81cf7fe8","ds_map":{"0":{"id":"56688e7db014eaac","src":"hda"},"1":{"id":"c86c1b73aa7102dd","src":"hdca"}},"$
o_add_to_history":"true"}
At some point I should really make this a config option. But, since you're already working with the code I figured I should mention it as it may help you out.
@apetkau That is very interesting to hear wrt Filter empty
tool and we need it filter output from confil
. :crying_cat_face:
I also suspected blend4j as the cause of the API error.
Thank you for the heads up on debugging interactions with Galaxy. :1st_place_medal:
Will keep you posted.
I was doing a bit more testing on this and I think I have a method for you to try out. This is a configuration option that can be set in Galaxy, force_beta_workflow_scheduled_for_collections, which when set to true
should force the usage of the newer workflow scheduler in Galaxy.
Now, this isn't a complete solution though. The reason I was continuing to use the older Galaxy API for running workflows is because it only returns when all jobs are completely scheduled. This means that after the workflow is scheduled I can query Galaxy to ask for the percent complete (e.g., 10 jobs complete out of 100 jobs), which is used to display the progress bar in IRIDA:
Using the newer Galaxy invoke
method for scheduling workflows, I believe the API call to Galaxy returns before all jobs are scheduled, so it may cause the progress bar in IRIDA to not correspond to the progress of the workflow (and there could also be other weird issues, I haven't done too much testing on this). The proper solution for this would be to modify IRIDA to properly handle the newer Galaxy API call, but this will take a bit of time.
I did test other workflows using the Filter empty
tool and they seemed to work in IRIDA. I believe that the reason your specific workflow is encountering this issue is because the tool confil
is generating collections of fastq files as output depending on the user input options, which are passed to Filter empty
. In the Galaxy log file I'm seeing:
galaxy.workflow.run DEBUG 2019-04-02 20:54:43,991 [p:28419,w:2,m:0] [uWSGIWorker2Core2] Marking step 97 outputs of invocation 19 delayed (tool [__FILTER_EMPTY_DATASETS__] inputs are not ready, this special tool requires inputs to be ready)
The Filter empty
tool is getting it's input from confil
and so Galaxy is saying in this line that it needs to wait for confil
to finish before scheduling the rest of the jobs (I'm guessing because the specific elements on the output collection from confil
are only determined at runtime). So, using the older Galaxy API (which should return only when all jobs are scheduled in Galaxy) is breaking in this specific case.
I'm getting bitten by this issue on a pipeline that I'm working on. Can I help to fix it?
Sure. Do you know what the corresponding message in the Galaxy logs is? Or do you have the workflow available to be tested?
This is the pipeline I'm working on: https://github.com/dfornika/irida-plugin-plasmid-screen
I have a similar situation where I'm trying to use a collection tool (either 'Collapse Collections' or 'Extract Dataset')
I see a similar stacktrace in the galaxy logs:
galaxy.workflow.run DEBUG 2019-12-24 02:22:49,060 [p:688,w:1,m:0] [uWSGIWorker1Core0] Workflow step 683 of invocation 16 invoked (128.016 ms)
galaxy.workflow.run DEBUG 2019-12-24 02:22:49,068 [p:688,w:1,m:0] [uWSGIWorker1Core0] Marking step 684 outputs of invocation 16 delayed (dependent step [680] delayed, so this step must be de
layed)
galaxy.workflow.run DEBUG 2019-12-24 02:22:49,075 [p:688,w:1,m:0] [uWSGIWorker1Core0] Marking step 685 outputs of invocation 16 delayed (dependent step [681] delayed, so this step must be de
layed)
galaxy.workflow.run DEBUG 2019-12-24 02:22:49,080 [p:688,w:1,m:0] [uWSGIWorker1Core0] Marking step 686 outputs of invocation 16 delayed (dependent step [680] delayed, so this step must be de
layed)
galaxy.web.framework.decorators ERROR 2019-12-24 02:22:49,154 [p:688,w:1,m:0] [uWSGIWorker1Core0] Uncaught exception in exposed API method:
Traceback (most recent call last):
File "lib/galaxy/web/framework/decorators.py", line 282, in decorator
rval = func(self, trans, *args, **kwargs)
File "lib/galaxy/webapps/galaxy/api/workflows.py", line 411, in create
for v in outputs[step.id].values():
AttributeError: 'object' object has no attribute 'values'
172.17.0.1 - - [24/Dec/2019:02:22:47 +0000] "POST /api/workflows?key=admin HTTP/1.1" 500 - "-" "Java/11.0.2"
Describe the bug
I have created a pipeline plugin and deployed it to an IRIDA instance. When I execute the pipeline I keep hitting the following error:
Steps to reproduce the problem
To build and upload https://github.com/COMBAT-TB/irida-pipeline-plugins/tree/master/snippy-pipeline-plugin
Expected behavior
I was expecting the workflow to run to completion.
Additional context
The job is submitted to Galaxy and runs steps 0 to 2.