nipype / pydra

Pydra Dataflow Engine
https://nipype.github.io/pydra/
Other
120 stars 59 forks source link

Cannot use attrs post init or factory to compute a fallback value to a field #640

Open ghisvail opened 1 year ago

ghisvail commented 1 year ago

EDIT: See post below with an alternative solution using @attrs.field.default

I am trying to compute a fallback value for an output_basename field based on an input_image.

@attrs.define(kw_only=True)
class MySpec(pydra.specs.ShellSpec):
    input_image: os.PathLike = attrs.field(...)  # mandatory

    output_basename: str = attrs.field(...)      # optional with fallback

Puting pydra aside, the documented way to achieve this with attrs should be with the post init feature, i.e.

...
    def __attrs_post_init__(self):
        # Fallback if output basename is unset
        self.output_basename = (
            self.output_basename or pathlib.PurePath(self.input_image).stem
        )

However, this triggers a AttributeError: 'MySpecInput' object has no attribute 'files_hash' when the corresponding SpecInfo object is instantiated.

So far the only workaround I found is to put the fallback logic in formatter, but the very same logic would need to be replicated to the output spec for each field requiring output_basename to compute a path to an output file.

Unless I missed a specific feature of pydra which would allow me to express this logic with less redundancy? To me, it's unfortunate we can't use this convenient feature from attrs.

yibeichan commented 1 year ago

Hello, @djarecka and I discussed a similar problem and had some thoughts. For output_basename, can we set something like, if output_basename is not specified, we automatically attach the function name to the input_image. For example, pydra-fsl, output_basename for Split(infile=test.nii.gz) would automatically become test_split.nii.gz if we don't specify it in the input_fields. (hope I'm making sense here

ghisvail commented 1 year ago

Indeed, FSLSplit is a good example. Reading through attrs docs on initialization, it sounds like the right solution would be to use the @field.default decorator like:

class FSLSplitSpec(pydra.specs.ShellSpec):
    """Specifications for fslsplit."""

    input_image: os.PathLike = attrs.field(
        metadata={
            "help_string": "input image",
            "mandatory": True,
            "argstr": "",
        }
    )

    output_basename: str = attrs.field(metadata={"help_string": "output basename", "argstr": ""})

    @output_basename.default
    def _output_basename_factory(self) -> str:
        return PurePath(self.input_image).name.split(".", maxsplit=1)[0]

    direction: str = attrs.field(
        default="t",
        metadata={
            "help_string": "split direction (x, y, z or t)",
            "argstr": "-{direction}",
            "allowed_values": {"x", "y", "z", "t"},
        },
    )

With this declaration, tasks instantiated with an explicit output_basename still work fine.

>>> task = FSLSplit(input_image="volume.nii", output_basename="slice", direction="z")
>>> task.cmdline
'fslsplit volume.nii.gz slice -z'

But produce unexpected results with the default output_basename:

>>> FSLSplit(input_image="input.nii.gz")
>>> task.cmdline
Expected:
    'fslsplit input.nii.gz input -t'
Got:
    'fslsplit input.nii.gz Factory(factory=<function FSLSplitSpec._output_basename_factory at 0x104680ea0>, takes_self=True) -t'

I like this approach better than __attrs_post_init__. I still need to figure out why the factory is not called when the commandline is generated.