nipype / pydra

Pydra Dataflow Engine
https://nipype.github.io/pydra/
Other
119 stars 57 forks source link

Issue with composing tasks with output_file_template in a workflow #641

Closed ghisvail closed 11 months ago

ghisvail commented 1 year ago

I am attempting to build workflows which compose shell command tasks together, some of which providing default output names via output_file_template. However, these trigger an exception at runtime if they are not explicitly set when running the workflow, which defeats their purpose in my opinion.

The following minimal example defines a toy backup function wrapping the cp command and using a templated output for the target file. The code works fine on the task alone, but fails when wrapped with a worfklow with:

future: <Task finished name='Task-2' coro=<ConcurrentFuturesWorker.exec_as_coro() done, defined at .../pydra/engine/workers.py:169> exception=Exception("target_file has to be str or bool, but LF('backup', 'target_file') set")>

I would expect both runs to be successful and yield the same result modulo the temporary directory prefix.

import os
import pathlib
import tempfile

import attrs
import pydra

class BackupFile(pydra.engine.ShellCommandTask):
    """Example task performing a file backup with cp.

    If not target file is provided, it uses a default file suffixed by copy.
    """

    @attrs.define(kw_only=True)
    class InputSpec(pydra.specs.ShellSpec):
        source_file: os.PathLike = attrs.field(
            metadata={"help_string": "source file", "mandatory": True, "argstr": ""}
        )

        target_file: str = attrs.field(
            metadata={
                "help_string": "target file",
                "argstr": "",
                "output_file_template": "{source_file}_copy",
            }
        )

    input_spec = pydra.specs.SpecInfo(name="Input", bases=(InputSpec,))

    executable = "cp"

def backup(name="backup", **kwargs) -> pydra.Workflow:

    wf = pydra.Workflow(input_spec=["source_file", "target_file"], name=name, **kwargs)

    wf.add(
        BackupFile(
            name="backup_file",
            source_file=wf.lzin.source_file,
            target_file=wf.lzin.target_file,
        )
    )

    wf.set_output({"target_file": wf.backup_file.lzout.target_file})

    return wf

if __name__ == "__main__":

    # Execute with standalone backup task.
    with tempfile.TemporaryDirectory() as td:
        source_file = (pathlib.Path(td) / "source1.file")
        source_file.touch()

        task = BackupFile(source_file=source_file)
        result = task()

        print(result.output.target_file)

    # Execute backup task wrapped in a workflow.
    with tempfile.TemporaryDirectory() as td:
        source_file = (pathlib.Path(td) / "source2.file")
        source_file.touch()

        task = backup(source_file=source_file)
        result = task()

        print(result.output.target_file)
ghisvail commented 1 year ago

The issue is here.

Looking at the following code extraction:

def template_update_single(...):
    ...
    if spec_type == "input":
        if field.type not in [str, ty.Union[str, bool]]:
            raise Exception(
                "fields with output_file_template"
                "has to be a string or Union[str, bool]"
            )
        inp_val_set = inputs_dict_st[field.name]
        if inp_val_set is not attr.NOTHING and not isinstance(inp_val_set, (str, bool)):
            raise Exception(
                f"{field.name} has to be str or bool, but {inp_val_set} set"
            )
        if isinstance(inp_val_set, bool) and field.type is str:
            raise Exception(
                f"type of {field.name} is str, consider using Union[str, bool]"
            )

Since type(inp_val_set) is LazyField in the workflow case above, it cannot satisfy the checks for str or bool.

ghisvail commented 1 year ago

One possible solution would be for LazyField to become a covariant typed variable like LazyField[str] or LazyField[bool], which could then satisfy the isinstance checks.

But I believe it would fail further down at the formatting stage where the actual value needs to be accessed. It looks like LazyField.get_value requires a workflow argument, which is not available in these helper functions.

Unless I missed something?

ghisvail commented 11 months ago

Fixed by #662