seek4science / seek

For finding, sharing and exchanging Data, Models, Simulations and Processes in Science.
http://www.seek4science.org
BSD 3-Clause "New" or "Revised" License
76 stars 52 forks source link

Use labels for Galaxy workflow outputs #904

Closed simleo closed 2 years ago

simleo commented 2 years ago

See this suggestion by @wm75: https://github.com/galaxyproject/iwc/issues/96

fbacall commented 2 years ago

Currently we're picking outputs from each step rather than workflow_outputs.

Is there a description of what workflow_outputs is, and whether it will always be present in a .ga file?

wm75 commented 2 years ago

@fbacall only workflow_outputs is what the name suggests. outputs is what individual tools produce, but the workflow selects with workflow_outputs what's considered relevant in its context. So for any workflow-centric approach, workflow_outputs is what should be used imo. Would you agree @mvdbeek ?

mvdbeek commented 2 years ago

Right, those are the top level outputs that are available in reports, as inputs to other subworkflow steps, they're highlighted in the UI, etc.

wm75 commented 2 years ago

Is there a description of what workflow_outputs is, and whether it will always be present in a .ga file?

Not sure about the first part, but each step in a .ga file will always have workflow_outputs. Its value might be an empty list though if that step does not have any of its outputs marked as workflow_outputs by the creator of the WF.

fbacall commented 2 years ago

What does the output_name of a workflow_output refer to?

wm75 commented 2 years ago

That's just a reference to the step's regular output, i.e. it says: the tool output with this name should become a workflow output.

wm75 commented 2 years ago

I think it's ok if that's parsed/displayed as the Name of the output in workflowhub since there isn't any readily available better alternative.

fbacall commented 2 years ago

OK so label will be the "ID", output_name will be the "Name". Do workflow_outputs ever have a type field like outputs do?

wm75 commented 2 years ago

No, that would just be redundant with the info in outputs.

fbacall commented 2 years ago

OK I'm confused then, because it seems sometimes there is no matching entry under outputs for an entry in workflow_outputs, e.g.

{
  "annotation": "Allele Frequency Filter. This is the minimum allele frequency required for variants to be included in the reports.",
  "content_id": null,
  "errors": null,
  "id": 0,
  "input_connections": {},
  "inputs": [
    {
      "description": "Allele Frequency Filter. This is the minimum allele frequency required for variants to be included in the reports.",
      "name": "AF Filter"
    }
  ],
  "label": "AF Filter",
  "name": "Input parameter",
  "outputs": [],
  "position": {
    "bottom": 414.7942708333333,
    "height": 46.3359375,
    "left": -421.578125,
    "right": -271.578125,
    "top": 368.4583333333333,
    "width": 150,
    "x": -421.578125,
    "y": 368.4583333333333
  },
  "tool_id": null,
  "tool_state": "{\"default\": 0.05, \"parameter_type\": \"float\", \"optional\": true}",
  "tool_version": null,
  "type": "parameter_input",
  "uuid": "2e5a5b38-c204-45a2-98e2-e113bce5a14b",
  "workflow_outputs": [
    {
      "label": null,
      "output_name": "output",
      "uuid": "e55d47c5-7ea0-45c4-8844-47bf29b88542"
    }
  ]
}
wm75 commented 2 years ago

Huh, a very good catch! This looks like a bug where a WF input (the step's type is parameter_input) has gotten turned into a WF output. Not sure how that has happened (maybe a Galaxy WF editor bug?), but it should be fixed in the iwc repo. One solution on your side would be to ignore steps with "type": "parameter_input" when looking for workflow_outputs.

mvdbeek commented 2 years ago

Not a bug, inputs are outputs. If you don't want to display them it is fine to skip them.

fbacall commented 2 years ago

how does this look?

image

wm75 commented 2 years ago

@fbacall Great!

Another option you may want to consider in the future would be to convert the .ga file to gxformat2 before trying to parse any information.

cat variation-reporting.gxwf.yml

class: GalaxyWorkflow
doc: This workflow takes a VCF dataset of variants produced by any of the variant
  calling workflows in https://github.com/galaxyproject/iwc/tree/main/workflows/sars-cov-2-variant-calling
  and generates tabular lists of variants by Samples and by Variant, and an overview
  plot of variants and their allele-frequencies.
label: 'COVID-19: variation analysis reporting'
tags:
- COVID-19
- covid19.galaxyproject.org
uuid: b08c744d-7c61-4b58-ac5f-4b5886c3c643
inputs:
  AF Filter:
    default: 0.05
    doc: Allele Frequency Filter. This is the minimum allele frequency required for
      variants to be included in the reports.
    optional: true
    position:
      bottom: 414.7942708333333

...

outputs:
  _anonymous_output_1:
    outputSource: AF Filter
  _anonymous_output_2:
    outputSource: DP Filter
  _anonymous_output_3:
    outputSource: DP_ALT Filter
  _anonymous_output_4:
    outputSource: Number of Clusters
  prefiltered_variants:
    outputSource: '6'
  filtered_variants:
    outputSource: '9'
  filtered_extracted_variants:
    outputSource: '10'
  filtered_and_renamed_effects:
    outputSource: 11/outfile_replace
  af_recalculated:
    outputSource: 12/out_file1
  collapsed_effects:
    outputSource: 13/out_file
  highest_impact_effects:
    outputSource: 14/outfile
  cleaned_header:
    outputSource: 15/outfile
  processed_variants_collection:
    outputSource: 16/outfile
  all_variants_all_samples:
    outputSource: 20/outfile
  variants_for_plotting:
    outputSource: 35/list_output_tab

So the WF inputs and outputs are nicely declared up front in that case.

wm75 commented 2 years ago

Minor complications are:

wm75 commented 2 years ago

One more related thing I just spotted on workflowhub:

Screenshot from 2022-02-18 13-19-51

In the steps section you're repeating the Input Params. So like for the workflow outputs, you would probably want to ignore steps with "type": "parameter_input".

fbacall commented 2 years ago

Trying a different approach of converting native > gxformat2 > CWL, parsing that, then supplementing the steps because otherwise they're pretty bare:

Inputs

<div class="table-responsive">
ID Name Description Type
AF Filter n/a Allele Frequency Filter. This is the minimum allele frequency required for variants to be included in the reports. float
DP Filter n/a Depth Filter. This is the minimum depth of all alignments at a variant site. int
DP_ALT Filter n/a Depth Filter for variant allele. This is the minimum depth of alignments supporting a variant. int
Number of Clusters n/a Number of Clusters to use in Variant Frequency Plot. int
Variation data to report n/a Variation data in VCF format. Can be the output of any of the workflows in https://github.com/galaxyproject/iwc/tree/main/workflows/sars-cov-2-variant-calling array containing File
gene products translations n/a A custom tabular file mapping NCBI RefSeq Protein identifiers as used by snpEff version 4.5covid19 to their commonly used names. Can be obtained from https://doi.org/10.5281/zenodo.4555734 File

Steps

<div class="table-responsive">
ID Name Description
6 SnpSift Filter toolshed.g2.bx.psu.edu/repos/iuc/snpsift/snpSift_filter/4.3+t.galaxy1
7 Compose text parameter value toolshed.g2.bx.psu.edu/repos/iuc/compose_text_param/compose_text_param/0.1.1
8 Compose text parameter value toolshed.g2.bx.psu.edu/repos/iuc/compose_text_param/compose_text_param/0.1.1
9 SnpSift Filter toolshed.g2.bx.psu.edu/repos/iuc/snpsift/snpSift_filter/4.3+t.galaxy1
10 SnpSift Extract Fields toolshed.g2.bx.psu.edu/repos/iuc/snpsift/snpSift_extractFields/4.3+t.galaxy0
11 Replace column toolshed.g2.bx.psu.edu/repos/bgruening/replace_column_by_key_value_file/replace_column_with_key_value_file/0.2
12 Compute toolshed.g2.bx.psu.edu/repos/devteam/column_maker/Add_a_column1/1.6
13 Datamash toolshed.g2.bx.psu.edu/repos/iuc/datamash_ops/datamash_ops/1.1.0
14 Replace toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_find_and_replace/1.1.3
15 Replace toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_find_and_replace/1.1.3
16 Replace toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_find_and_replace/1.1.3
17 Collapse Collection toolshed.g2.bx.psu.edu/repos/nml/collapse_collections/collapse_dataset/5.1.0
18 Compute toolshed.g2.bx.psu.edu/repos/devteam/column_maker/Add_a_column1/1.6
19 Compute toolshed.g2.bx.psu.edu/repos/devteam/column_maker/Add_a_column1/1.6
20 Replace toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_find_and_replace/1.1.3
21 Datamash toolshed.g2.bx.psu.edu/repos/iuc/datamash_ops/datamash_ops/1.1.0
22 Filter Filter1
23 Datamash toolshed.g2.bx.psu.edu/repos/iuc/datamash_ops/datamash_ops/1.1.0
24 Join toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_easyjoin_tool/1.1.2
25 Datamash toolshed.g2.bx.psu.edu/repos/iuc/datamash_ops/datamash_ops/1.1.0
26 Datamash toolshed.g2.bx.psu.edu/repos/iuc/datamash_ops/datamash_ops/1.1.0
27 Datamash toolshed.g2.bx.psu.edu/repos/iuc/datamash_ops/datamash_ops/1.1.0
28 Join toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_easyjoin_tool/1.1.2
29 Join toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_easyjoin_tool/1.1.2
30 Cut Cut1
31 Join toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_easyjoin_tool/1.1.2
32 Cut Cut1
33 Replace toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_find_and_replace/1.1.3
34 Cut Cut1
35 Split file toolshed.g2.bx.psu.edu/repos/bgruening/split_file_to_collection/split_file_to_collection/0.5.0
36 Variant Frequency Plot toolshed.g2.bx.psu.edu/repos/iuc/snpfreqplot/snpfreqplot/1.0+galaxy3
37 Sort toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_sort_header_tool/1.1.1
38 Sort toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_sort_header_tool/1.1.1

Outputs

<div class="table-responsive">
ID Name Description Type
_anonymous_output_1 n/a n/a File
_anonymous_output_2 n/a n/a File
_anonymous_output_3 n/a n/a File
_anonymous_output_4 n/a n/a File
af_recalculated n/a n/a File
all_variants_all_samples n/a n/a File
by_variant_report n/a n/a File
cleaned_header n/a n/a File
collapsed_effects n/a n/a File
combined_variant_report n/a n/a File
filtered_and_renamed_effects n/a n/a File
filtered_extracted_variants n/a n/a File
filtered_variants n/a n/a File
highest_impact_effects n/a n/a File
prefiltered_variants n/a n/a File
processed_variants_collection n/a n/a File
variant_frequency_plot n/a n/a File
variants_for_plotting n/a n/a File