nipype / pydra

Pydra Dataflow Engine
https://nipype.github.io/pydra/
Other
119 stars 57 forks source link

How to implement type-checking of split tasks #660

Open tclose opened 1 year ago

tclose commented 1 year ago

How should a type-checker handle the following case in the unittests?


    filename = [str(filename_1), str(filename_2)]

    my_input_spec = SpecInfo(
        name="Input",
        fields=[
            (
                "file",
                attr.ib(
                    type=File,
                    metadata={
                        "mandatory": True,
                        "position": 1,
                        "argstr": "",
                        "help_string": "input file",
                    },
                ),
            )
        ],
        bases=(DockerSpec,),
    )

    docky = DockerTask(
        name="docky",
        image="busybox",
        executable=cmd,
        file=filename,
        input_spec=my_input_spec,
        strip=True,
    ).split("file")

Because the task is split over "file" then it is provided a list of files to run (whereas each task operates on a single file), but type-checking fails on task creation since a list of files are passed to "file" instead of a single file.

Would syntax like this work instead?


    docky = DockerTask(
        name="docky",
        image="busybox",
        executable=cmd,
        file=Splitter(filename),  # The `Splitter` object could be treated as a special case by the type checker, and detected when the task is run
        input_spec=my_input_spec,
        strip=True,
    )
djarecka commented 1 year ago

hmm, not sure if I like the syntax.... we would still have to have a separate splitter for more complex cases.

Perhaps the type-checker could be modified to check also inside list, if list is provided and only give warnings in this case?

tclose commented 1 year ago

I have added a new special class called "gathered", which is just a minimal subclass of list. The split syntax stays the same, the type checker just checks the types of the gathered list items instead of the gathered object itself.

I initially tried relaxing the type checker so it could handle lists of the type, but when dealing with complex splitting and combinations you can get lists of lists and it wouldn't be possible to handle all cases (combine states return "gathered" objects instead of lists).

Does this seem reasonable? I think it makes the code more explici

tclose commented 1 year ago

Also, the type-checking code is inseparable from the type-coercing-to-File code we discussed in the last dev meeting (possibly after you left the call), which is required for @effigies hash dispatch PR to work.

I'm not sure about the "gathered" nomenclature though and was toying with "nodespan" or just "array" to designate a value which is to be split across multiple nodes. Any ideas?