roocs / rook

A Web Processing Service for roocs: remote operations on climate simulations.
https://rook-wps.readthedocs.io/en/latest/
Apache License 2.0
5 stars 7 forks source link

Create workflow example using wind speed #5

Closed agstephens closed 4 years ago

agstephens commented 4 years ago

Expand workflow example to calculate wind speed using a function tree.

Example 1: JSON file could look like:

{
  "notes": [
    "def add(a, b): return a + b",
    "math.pow(add(math.pow(u, 2), math.pow(v, 2)), 0.5)",
    "should 'workflow' section be a list or dictionary at top-level?"
  ],
  "workflow": {
    "inputs": {
      "data_refs": {
        "input1": "cmip5.output1.MOHC.HadGEM2-ES.rcp85.mon.atmos.Amon.r1i1p1.latest.uas",
        "input2": "cmip5.output1.MOHC.HadGEM2-ES.rcp85.mon.atmos.Amon.r1i1p1.latest.vas"
      }
    },
    "workflow": [
      {
        "id": "math.pow",
        "inputs": [
          {
            "id": "add",
            "inputs": [
              {
                "id": "math.pow",
                "inputs": [
                  "@input1"
                ],
                "arguments": [
                  2.0
                ]
              },
              {
                "id": "math.pow",
                "inputs": [
                  "@input2"
                ],
                "arguments": [
                  2.0
                ]
              }
            ],
            "arguments": []
          }
        ],
        "arguments": [
          0.5
        ]
      }
    ]
  }
}
cehbrecht commented 4 years ago

@agstephens do you want to allow arbitrary function calls in the workflow (like math.pow)? Or is it just for the tree example? Currently I would restrict the workflow only to the provided daops functions.

agstephens commented 4 years ago

@cehbrecht, I have used math.pow in this example to demonstrate how it might work.

I think the actual steps should always be a call to daops or clisops.

cehbrecht commented 4 years ago

Example 2: Here is a pseudo CWL workflow description:

{
    "cwlVersion": "v1.0",
    "class": "Workflow",
    "doc": "pow(add(pow(u, 2), pow(v, 2)), 0.5)",
    "inputs": {
        "data_ref_u": ["cmip5.output1.MOHC.HadGEM2-ES.rcp85.mon.atmos.Amon.r1i1p1.latest.uas"],
        "data_ref_v": ["cmip5.output1.MOHC.HadGEM2-ES.rcp85.mon.atmos.Amon.r1i1p1.latest.vas"]
    },
    "outputs": {
        "output": {
            "type": "File",
            "outputSource": "windspeed/output"
        }
    },
    "steps": {
        "pow_u": {
            "run": "pow",
            "in": {
                "data_ref": "data_ref_u",
                "power": "2.0"
            },
            "out": ["output"]
        },
        "pow_v": {
            "run": "pow",
            "in": {
                "data_ref": "data_ref_v",
                "power": "2.0"
            },
            "out": ["output"]
        },
        "add": {
           "run": "add",
           "in": {
                "a": "pow_u/output",
                "b": "pow_v/output"
            },
            "out": ["output"]
        },
        "windspeed": {
            "run": "pow",
            "in": {
                "data_ref": "add/output",
                "power": "0.5"
            },
            "out": ["output"]
        }
    }
}

The workflow is easier to read without the recursion.

cehbrecht commented 4 years ago

Example 3: slimmed version of pseudo CWL Workflow:


{
    "doc": "pow(add(pow(u, 2), pow(v, 2)), 0.5)",
    "inputs": {
        "data_ref_u": ["cmip5.output1.MOHC.HadGEM2-ES.rcp85.mon.atmos.Amon.r1i1p1.latest.uas"],
        "data_ref_v": ["cmip5.output1.MOHC.HadGEM2-ES.rcp85.mon.atmos.Amon.r1i1p1.latest.vas"]
    },
    "outputs": {
      "windspeed": "windspeed/output"
    },
    "steps": {
        "pow_u": {
            "run": "pow",
            "in": {
                "data_ref": "inputs/data_ref_u",
                "power": "2.0"
            },
        },
        "pow_v": {
            "run": "pow",
            "in": {
                "data_ref": "inputs/data_ref_v",
                "power": "2.0"
            },
        },
        "add": {
           "run": "add",
           "in": {
                "a": "pow_u/output",
                "b": "pow_v/output"
            },
        },
        "windspeed": {
            "run": "pow",
            "in": {
                "data_ref": "add/output",
                "power": "0.5"
            },
        }
    }
}
cehbrecht commented 4 years ago

Example 4: updated example 1 with recursion.

{
  "doc": "pow(add(pow(u, 2), pow(v, 2)), 0.5)",
  "inputs": {
      "data_ref_u": ["cmip5.output1.MOHC.HadGEM2-ES.rcp85.mon.atmos.Amon.r1i1p1.latest.uas"],
      "data_ref_v": ["cmip5.output1.MOHC.HadGEM2-ES.rcp85.mon.atmos.Amon.r1i1p1.latest.vas"]
  },
  "outputs": {
    "windspeed": {
      "run": "pow",
      "in": {
          "data_ref": {
             "run": "add",
             "in": {
                  "a": {
                      "run": "pow",
                      "in": {
                          "data_ref": "@data_ref_u",
                          "power": "2.0"
                      },
                  },
                  "b": {
                      "run": "pow",
                      "in": {
                          "data_ref": "@data_ref_v",
                          "power": "2.0"
                      },
                  },
              },
          },
          "power": "0.5"
      }
    }
  }
}
cehbrecht commented 4 years ago

@agstephens In our current daops we only have functions with one input parameter for data references (data_ref) and one output parameter. To get a tree instead of a simple chain should we merge the outputs of several steps to the data_ref input?

"average_tas": {
              "run": "average",
              "in": {
                  "data_ref": ["subset_tas_1/output", "subset_tas_2/output],
                  "axes": "time"
              }
          }
agstephens commented 4 years ago

Out-dated by diff example.