nipype / pydra

Pydra Dataflow Engine
https://nipype.github.io/pydra/
Other
119 stars 57 forks source link

Parse specs from yaml files (introducing a simple domain specific language for pydra) #367

Open dafrose opened 3 years ago

dafrose commented 3 years ago

Hi folks,

I am suggesting the this, because I kind of thought that it was already there but could not find it documented anywhere.

What would you like changed/added and why?

In pydra, shell task specifications are mostly text-based, with dictionaries and lists sprinkled all over the place. This could very easily be serialized into something like yaml or json (I prefer the first). In fact, when looking at something like this, it looks very similar to json syntax. So why not abstract away most of the boilerplate code and allow users to write task specifications in yaml instead?

Example (FSL bet)

Adapting the example for FSL bet from the docs, this could look somewhat like this:

my_bet: 
  bases: pydra.engine.ShellCommandTask
  executable: bet
  input_spec: bet_input_spec

bet_input_spec: 
  bases: pydra.engine.ShellSpec
  name: Input
  fields:
    in_file:
      pydra.File:
        help_string: "input file ..."
        position: 1
        mandatory: True
    out_file:
      str: 
        help_string: "name of output ..."
        position: 2
        output_file_template: {in_file}_br
    mask:
      bool:
        help_string: "create binary mask"
        argstr: "-m",

or even nested like this

my_bet: 
  bases: pydra.engine.ShellCommandTask
  executable: bet
  input_spec: 
    bases: pydra.engine.ShellSpec
    name: Input
    fields: # [...]

Of course there are many ways to do this and some discussion would be needed to iron out a proper specification.

What would be the benefit? Does the change make something easier to use?

That's the whole point: Ease of use. You could simply load these specifications with pydra and run them, possibly without a single line of (python) code. Of course, there might be more advanced usage where you might want to either directly include python code or reference it from the spec, but that can be done as well. Of course this should only an extension to actual python API, it should not replace it and be kept as close to it as possible.

Other projects with similar approaches


When I first saw pydra I immediately thought you would do this, but then I found no mention of it. It would be simple enough to write my own parser for this, but a proper specification would be better. So what do you think about this idea?

PS: YAML has support for referencing actual types from program code, but I find this concept too complex for simple use cases and especially for new users.

djarecka commented 3 years ago

@dafrose - thanks for opening the issue! Yes, we should add this option.

I'm using yml format for building a converter (from nipype to pydra task, example for fsl is here), but pydra should be able to read the spec from yml.

dafrose commented 3 years ago

Hi @djarecka , I have a few questions regarding your yml-spec:

  1. I don't see an explicit input spec in your yaml files, only conditions that might reflect on optional inputs, but not on mandatory. How do you define inputs with this spec?

  2. What is the distinction between filename and cmd?

Otherwise, I like the flexibility that the your usage of filename templates offers.

  1. With regards to a future specification for writing specs in yml: Would you prefer to restrict these specs to command line interfaces only? If that's the case, there does not need to be a clear distinction between different types of interfaces (e.g. function vs. CLI). But it might still be worth including something like a base attribute to allow inheritance from some common structure (like previously in nipype for all FSL tools or all MRTRIX3 tools and so on...).

In PyRates, we decided to use slash / notation for absolute or relative system paths and dot . notation for things that Python can find with its import architecture. Both could refer to either Python code or other yaml-specs. This could then look like this:

MySpec:
  base: MyBase  # referencing something in the same file

MySpec2:
  base: ../../path/to/file/MyBase2  # relative path

MySpec3:
  base: /drive/path/to/file/MyBase3  # absolute path

MySpec4:
  base: path.to.file.MyBase4  # python path

Regarding file endings, we decided look for files that end with .py, .yaml, or .yml in the order of listing (or rather try to import first and then look for yaml files). Would you like to include the base attribute or rather keep it out to simplify the specification?

dafrose commented 3 years ago

My notes from the call today (regarding this issue):

Way forward

Open Questions

Do have comments on my notes or my questions @djarecka @effigies @PeerHerholz @satra @oesteban ?