Closed simleo closed 2 years ago
Currently we're picking outputs
from each step rather than workflow_outputs
.
Is there a description of what workflow_outputs
is, and whether it will always be present in a .ga file?
@fbacall only workflow_outputs
is what the name suggests. outputs
is what individual tools produce, but the workflow selects with workflow_outputs
what's considered relevant in its context.
So for any workflow-centric approach, workflow_outputs
is what should be used imo. Would you agree @mvdbeek ?
Right, those are the top level outputs that are available in reports, as inputs to other subworkflow steps, they're highlighted in the UI, etc.
Is there a description of what workflow_outputs is, and whether it will always be present in a .ga file?
Not sure about the first part, but each step in a .ga file will always have workflow_outputs
. Its value might be an empty list though if that step does not have any of its outputs
marked as workflow_outputs
by the creator of the WF.
What does the output_name
of a workflow_output
refer to?
That's just a reference to the step's regular output, i.e. it says: the tool output with this name should become a workflow output.
I think it's ok if that's parsed/displayed as the Name of the output in workflowhub since there isn't any readily available better alternative.
OK so label
will be the "ID", output_name
will be the "Name". Do workflow_outputs
ever have a type
field like outputs
do?
No, that would just be redundant with the info in outputs
.
OK I'm confused then, because it seems sometimes there is no matching entry under outputs
for an entry in workflow_outputs
, e.g.
{
"annotation": "Allele Frequency Filter. This is the minimum allele frequency required for variants to be included in the reports.",
"content_id": null,
"errors": null,
"id": 0,
"input_connections": {},
"inputs": [
{
"description": "Allele Frequency Filter. This is the minimum allele frequency required for variants to be included in the reports.",
"name": "AF Filter"
}
],
"label": "AF Filter",
"name": "Input parameter",
"outputs": [],
"position": {
"bottom": 414.7942708333333,
"height": 46.3359375,
"left": -421.578125,
"right": -271.578125,
"top": 368.4583333333333,
"width": 150,
"x": -421.578125,
"y": 368.4583333333333
},
"tool_id": null,
"tool_state": "{\"default\": 0.05, \"parameter_type\": \"float\", \"optional\": true}",
"tool_version": null,
"type": "parameter_input",
"uuid": "2e5a5b38-c204-45a2-98e2-e113bce5a14b",
"workflow_outputs": [
{
"label": null,
"output_name": "output",
"uuid": "e55d47c5-7ea0-45c4-8844-47bf29b88542"
}
]
}
Huh, a very good catch! This looks like a bug where a WF input (the step's type
is parameter_input
) has gotten turned into a WF output.
Not sure how that has happened (maybe a Galaxy WF editor bug?), but it should be fixed in the iwc repo.
One solution on your side would be to ignore steps with "type": "parameter_input"
when looking for workflow_outputs
.
Not a bug, inputs are outputs. If you don't want to display them it is fine to skip them.
how does this look?
@fbacall Great!
Another option you may want to consider in the future would be to convert the .ga file to gxformat2 before trying to parse any information.
cat variation-reporting.gxwf.yml
class: GalaxyWorkflow
doc: This workflow takes a VCF dataset of variants produced by any of the variant
calling workflows in https://github.com/galaxyproject/iwc/tree/main/workflows/sars-cov-2-variant-calling
and generates tabular lists of variants by Samples and by Variant, and an overview
plot of variants and their allele-frequencies.
label: 'COVID-19: variation analysis reporting'
tags:
- COVID-19
- covid19.galaxyproject.org
uuid: b08c744d-7c61-4b58-ac5f-4b5886c3c643
inputs:
AF Filter:
default: 0.05
doc: Allele Frequency Filter. This is the minimum allele frequency required for
variants to be included in the reports.
optional: true
position:
bottom: 414.7942708333333
...
outputs:
_anonymous_output_1:
outputSource: AF Filter
_anonymous_output_2:
outputSource: DP Filter
_anonymous_output_3:
outputSource: DP_ALT Filter
_anonymous_output_4:
outputSource: Number of Clusters
prefiltered_variants:
outputSource: '6'
filtered_variants:
outputSource: '9'
filtered_extracted_variants:
outputSource: '10'
filtered_and_renamed_effects:
outputSource: 11/outfile_replace
af_recalculated:
outputSource: 12/out_file1
collapsed_effects:
outputSource: 13/out_file
highest_impact_effects:
outputSource: 14/outfile
cleaned_header:
outputSource: 15/outfile
processed_variants_collection:
outputSource: 16/outfile
all_variants_all_samples:
outputSource: 20/outfile
variants_for_plotting:
outputSource: 35/list_output_tab
So the WF inputs and outputs are nicely declared up front in that case.
Minor complications are:
One more related thing I just spotted on workflowhub:
In the steps section you're repeating the Input Params. So like for the workflow outputs, you would probably want to ignore steps with "type": "parameter_input"
.
Trying a different approach of converting native > gxformat2 > CWL, parsing that, then supplementing the steps because otherwise they're pretty bare:
<div class="table-responsive">
ID | Name | Description | Type |
---|---|---|---|
AF Filter | n/a | Allele Frequency Filter. This is the minimum allele frequency required for variants to be included in the reports. | float |
DP Filter | n/a | Depth Filter. This is the minimum depth of all alignments at a variant site. | int |
DP_ALT Filter | n/a | Depth Filter for variant allele. This is the minimum depth of alignments supporting a variant. | int |
Number of Clusters | n/a | Number of Clusters to use in Variant Frequency Plot. | int |
Variation data to report | n/a | Variation data in VCF format. Can be the output of any of the workflows in https://github.com/galaxyproject/iwc/tree/main/workflows/sars-cov-2-variant-calling | array containing File |
gene products translations | n/a | A custom tabular file mapping NCBI RefSeq Protein identifiers as used by snpEff version 4.5covid19 to their commonly used names. Can be obtained from https://doi.org/10.5281/zenodo.4555734 | File |
<div class="table-responsive">
ID | Name | Description |
---|---|---|
6 | SnpSift Filter | toolshed.g2.bx.psu.edu/repos/iuc/snpsift/snpSift_filter/4.3+t.galaxy1 |
7 | Compose text parameter value | toolshed.g2.bx.psu.edu/repos/iuc/compose_text_param/compose_text_param/0.1.1 |
8 | Compose text parameter value | toolshed.g2.bx.psu.edu/repos/iuc/compose_text_param/compose_text_param/0.1.1 |
9 | SnpSift Filter | toolshed.g2.bx.psu.edu/repos/iuc/snpsift/snpSift_filter/4.3+t.galaxy1 |
10 | SnpSift Extract Fields | toolshed.g2.bx.psu.edu/repos/iuc/snpsift/snpSift_extractFields/4.3+t.galaxy0 |
11 | Replace column | toolshed.g2.bx.psu.edu/repos/bgruening/replace_column_by_key_value_file/replace_column_with_key_value_file/0.2 |
12 | Compute | toolshed.g2.bx.psu.edu/repos/devteam/column_maker/Add_a_column1/1.6 |
13 | Datamash | toolshed.g2.bx.psu.edu/repos/iuc/datamash_ops/datamash_ops/1.1.0 |
14 | Replace | toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_find_and_replace/1.1.3 |
15 | Replace | toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_find_and_replace/1.1.3 |
16 | Replace | toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_find_and_replace/1.1.3 |
17 | Collapse Collection | toolshed.g2.bx.psu.edu/repos/nml/collapse_collections/collapse_dataset/5.1.0 |
18 | Compute | toolshed.g2.bx.psu.edu/repos/devteam/column_maker/Add_a_column1/1.6 |
19 | Compute | toolshed.g2.bx.psu.edu/repos/devteam/column_maker/Add_a_column1/1.6 |
20 | Replace | toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_find_and_replace/1.1.3 |
21 | Datamash | toolshed.g2.bx.psu.edu/repos/iuc/datamash_ops/datamash_ops/1.1.0 |
22 | Filter | Filter1 |
23 | Datamash | toolshed.g2.bx.psu.edu/repos/iuc/datamash_ops/datamash_ops/1.1.0 |
24 | Join | toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_easyjoin_tool/1.1.2 |
25 | Datamash | toolshed.g2.bx.psu.edu/repos/iuc/datamash_ops/datamash_ops/1.1.0 |
26 | Datamash | toolshed.g2.bx.psu.edu/repos/iuc/datamash_ops/datamash_ops/1.1.0 |
27 | Datamash | toolshed.g2.bx.psu.edu/repos/iuc/datamash_ops/datamash_ops/1.1.0 |
28 | Join | toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_easyjoin_tool/1.1.2 |
29 | Join | toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_easyjoin_tool/1.1.2 |
30 | Cut | Cut1 |
31 | Join | toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_easyjoin_tool/1.1.2 |
32 | Cut | Cut1 |
33 | Replace | toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_find_and_replace/1.1.3 |
34 | Cut | Cut1 |
35 | Split file | toolshed.g2.bx.psu.edu/repos/bgruening/split_file_to_collection/split_file_to_collection/0.5.0 |
36 | Variant Frequency Plot | toolshed.g2.bx.psu.edu/repos/iuc/snpfreqplot/snpfreqplot/1.0+galaxy3 |
37 | Sort | toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_sort_header_tool/1.1.1 |
38 | Sort | toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_sort_header_tool/1.1.1 |
<div class="table-responsive">
ID | Name | Description | Type |
---|---|---|---|
_anonymous_output_1 | n/a | n/a | File |
_anonymous_output_2 | n/a | n/a | File |
_anonymous_output_3 | n/a | n/a | File |
_anonymous_output_4 | n/a | n/a | File |
af_recalculated | n/a | n/a | File |
all_variants_all_samples | n/a | n/a | File |
by_variant_report | n/a | n/a | File |
cleaned_header | n/a | n/a | File |
collapsed_effects | n/a | n/a | File |
combined_variant_report | n/a | n/a | File |
filtered_and_renamed_effects | n/a | n/a | File |
filtered_extracted_variants | n/a | n/a | File |
filtered_variants | n/a | n/a | File |
highest_impact_effects | n/a | n/a | File |
prefiltered_variants | n/a | n/a | File |
processed_variants_collection | n/a | n/a | File |
variant_frequency_plot | n/a | n/a | File |
variants_for_plotting | n/a | n/a | File |
See this suggestion by @wm75: https://github.com/galaxyproject/iwc/issues/96