Hi folks,

I am suggesting the this, because I kind of thought that it was already there but could not find it documented anywhere.

What would you like changed/added and why?

In pydra, shell task specifications are mostly text-based, with dictionaries and lists sprinkled all over the place. This could very easily be serialized into something like yaml or json (I prefer the first). In fact, when looking at something like this, it looks very similar to json syntax. So why not abstract away most of the boilerplate code and allow users to write task specifications in yaml instead?

Example (`FSL bet`)

Adapting the example for FSL bet from the docs, this could look somewhat like this:

my_bet: 
  bases: pydra.engine.ShellCommandTask
  executable: bet
  input_spec: bet_input_spec

bet_input_spec: 
  bases: pydra.engine.ShellSpec
  name: Input
  fields:
    in_file:
      pydra.File:
        help_string: "input file ..."
        position: 1
        mandatory: True
    out_file:
      str: 
        help_string: "name of output ..."
        position: 2
        output_file_template: {in_file}_br
    mask:
      bool:
        help_string: "create binary mask"
        argstr: "-m",

or even nested like this

my_bet: 
  bases: pydra.engine.ShellCommandTask
  executable: bet
  input_spec: 
    bases: pydra.engine.ShellSpec
    name: Input
    fields: # [...]

Of course there are many ways to do this and some discussion would be needed to iron out a proper specification.

What would be the benefit? Does the change make something easier to use?

That's the whole point: Ease of use. You could simply load these specifications with pydra and run them, possibly without a single line of (python) code. Of course, there might be more advanced usage where you might want to either directly include python code or reference it from the spec, but that can be done as well. Of course this should only an extension to actual python API, it should not replace it and be kept as close to it as possible.

Other projects with similar approaches

the GUI framework kivy is a good example with its kv language. They actually make a lot of cool stuff with this approach.
we are doing something similar in our project PyRates in order to create templates for neural networks for simulations from simple-to-read yaml files, e.g. this Jansen-Rit template. Of course this is not meant to be self-promotion, but you see where I am coming from...

When I first saw pydra I immediately thought you would do this, but then I found no mention of it. It would be simple enough to write my own parser for this, but a proper specification would be better. So what do you think about this idea?

PS: YAML has support for referencing actual types from program code, but I find this concept too complex for simple use cases and especially for new users.

@dafrose - thanks for opening the issue! Yes, we should add this option.

I'm using yml format for building a converter (from nipype to pydra task, example for fsl is here), but pydra should be able to read the spec from yml.

Hi @djarecka , I have a few questions regarding your yml-spec:

I don't see an explicit input spec in your yaml files, only conditions that might reflect on optional inputs, but not on mandatory. How do you define inputs with this spec?
What is the distinction between filename and cmd?

Otherwise, I like the flexibility that the your usage of filename templates offers.

With regards to a future specification for writing specs in yml: Would you prefer to restrict these specs to command line interfaces only? If that's the case, there does not need to be a clear distinction between different types of interfaces (e.g. function vs. CLI). But it might still be worth including something like a base attribute to allow inheritance from some common structure (like previously in nipype for all FSL tools or all MRTRIX3 tools and so on...).

In PyRates, we decided to use slash / notation for absolute or relative system paths and dot . notation for things that Python can find with its import architecture. Both could refer to either Python code or other yaml-specs. This could then look like this:

MySpec:
  base: MyBase  # referencing something in the same file

MySpec2:
  base: ../../path/to/file/MyBase2  # relative path

MySpec3:
  base: /drive/path/to/file/MyBase3  # absolute path

MySpec4:
  base: path.to.file.MyBase4  # python path

Regarding file endings, we decided look for files that end with .py, .yaml, or .yml in the order of listing (or rather try to import first and then look for yaml files). Would you like to include the base attribute or rather keep it out to simplify the specification?

My notes from the call today (regarding this issue):

Way forward

I will try to draft a spec and coverters to/from yaml as I go (and find time)
spec should be close to pydra's syntax
spec may contain anchors

new suggestions based on above spec: , e.g.:

&bet_input_spec  # need to check that this actually works, but could also just nest it down there
bases: pydra.engine.ShellSpec
name: Input
fields:
  - name: in_file
    type: pydra.File
    metadata:  
      help_string: "input file ..."
      position: 1
      mandatory: True
  - name: out_file
    type: str
    metadata:
      help_string: "name of output ..."
      position: 2
      output_file_template: {in_file}_br
  - name: mask
    type: bool
    metadata: 
      help_string: "create binary mask"
      argstr: "-m",

MyBet: 
bases: pydra.engine.ShellCommandTask
executable: bet
input_spec: *bet_input_spec

Open Questions

Do we want to allow (unsafe) arbitrary code execution from yaml files?
- Python objects can be directly referenced using !!python/... notation.
- This would make the parser easier, because standard YAML parsers already know how to treat these
- Would also make the YAML file uglier (more complex/difficult to read)
- Using !!python/object/apply opens up to arbitrary code execution and is therefore considered "unsafe". If pydra is expected to be run on secure environments with trusted code, then this could become problematic, especially to unaware users.
- Since nipype/pydra is meant to work with arbitrary user code, this might actually not be an issue.
Which library would you prefer, PyYAML or ruamel.yaml?
- PyYAML is maybe considered more "standard" but is restricted to YAML 1.1 (2005)
- ruamel.yaml is a modernized fork of PyYAML that supports the most recent revision, YAML 1.2 (2009)
- ruamel.yaml also finds widespread usage
- I personally prefer to use ruamel.yaml

Do have comments on my notes or my questions @djarecka @effigies @PeerHerholz @satra @oesteban ?

nipype / pydra

Parse specs from yaml files (introducing a simple domain specific language for pydra) #367

What would you like changed/added and why?

Example (`FSL bet`)

What would be the benefit? Does the change make something easier to use?

Other projects with similar approaches

Way forward

Open Questions

nipype / pydra

Parse specs from yaml files (introducing a simple domain specific language for pydra) #367

What would you like changed/added and why?

Example (FSL bet)

What would be the benefit? Does the change make something easier to use?

Other projects with similar approaches

Way forward

Open Questions

Example (`FSL bet`)