nipype / pydra

Pydra Dataflow Engine
https://nipype.github.io/pydra/
Other
119 stars 57 forks source link

Support for dataclasses at a surface level #634

Open ghisvail opened 1 year ago

ghisvail commented 1 year ago

I am a big advocate for class-based definition of input and output task specifications over dynamic constructs via pydra.specs.SpecInfo(..., fields=[...]).

One annoying thing though is that class-based specifications aren't working with built-in dataclasses although they could from a pure semantic viewpoint. I understand the rationales (mostly pickling) for the transition at the implementation level, but I was wondering whether we could keep support for dataclasses at the surface API level.

For instance, let's take the following interface for fslreorder2std:

@attrs.define(slots=False, kw_only=True)
class FSLReorient2StdSpec(pydra.specs.ShellSpec):
    """Specificiations for fslreorient2std."""

    input_image: os.PathLike = attrs.field(
        metadata={
            "help_string": "input image",
            "mandatory": True,
            "argstr": "",
            "position": -2,
        }
    )

    output_image: str = attrs.field(
        metadata={
            "help_string": "output reoriented image",
            "argstr": "",
            "position": -1,
            "output_file_template": "{input_image}_r2std",
        }
    )

    output_matrix: str = attrs.field(
        metadata={
            "help_string": "output transformation matrix",
            "argstr": "-m",
            "output_file_template": "{input_image}_r2std.mat",
            "keep_extension": False,
        }
    )

class FSLReorient2Std(pydra.engine.ShellCommandTask):
    """Task definition for fslreorient2std."""

    executable = "fslreorient2std"

    input_spec = pydra.specs.SpecInfo(
        name="FSLReorient2StdInput", bases=(FSLReorient2StdSpec,)
    )

There really is nothing specific to attrs which is not covered by dataclasses:

diff --git a/pydra/tasks/fsl/fslreorient2std.py b/pydra/tasks/fsl/fslreorient2std.py
index c35a914..28b1724 100644
--- a/pydra/tasks/fsl/fslreorient2std.py
+++ b/pydra/tasks/fsl/fslreorient2std.py
@@ -15,18 +15,18 @@ Examples

 __all__ = ["FSLReorient2Std"]

-import os

-import attrs
+import dataclasses
+import os

 import pydra

-@attrs.define(slots=False, kw_only=True)
+@dataclasses.dataclass(kw_only=True)
 class FSLReorient2StdSpec(pydra.specs.ShellSpec):
     """Specificiations for fslreorient2std."""

-    input_image: os.PathLike = attrs.field(
+    input_image: os.PathLike = dataclasses.field(
         metadata={
             "help_string": "input image",
             "mandatory": True,
@@ -35,7 +35,7 @@ class FSLReorient2StdSpec(pydra.specs.ShellSpec):
         }
     )

-    output_image: str = attrs.field(
+    output_image: str = dataclasses.field(
         metadata={
             "help_string": "output reoriented image",
             "argstr": "",
@@ -44,7 +44,7 @@ class FSLReorient2StdSpec(pydra.specs.ShellSpec):
         }
     )

-    output_matrix: str = attrs.field(
+    output_matrix: str = dataclasses.field(
         metadata={
             "help_string": "output transformation matrix",
             "argstr": "-m",

As you can see the semantics are equivalent (not suprising considering their historical synergies), yet dataclasses has the benefits of being built-in and having its documentation right there with the rest of Python. Perhaps by transforming the dataclasses-based declaration into an attrs representation internally?

ghisvail commented 1 year ago

Keep in mind that I might be missing some relevant context. I am reporting this issue from the viewpoint of a task contributor and a novice developer to the project. I know quite a few project which made the jump to attrs (for instance pytest lately if I am not mistaken), yet the surface-level API does not leak any attrs specific features or limitations to my knowledge.

satra commented 1 year ago

@ghisvail - this is a path we took after careful consideration of all the issues we ran into with dataclasses. we realized we were building attrs when working with dataclasses. having two different object types makes it a lot more complicated, even if it's hidden away internally.

hence, i would suggest going down this road only when the following aspects have been considered/demonstrated for dataclasses:

pydra is not just for CLI tools. we wanted pydra to be flexible enough to handle python functions, also in other libraries, as well and to be able to use them in workflows. in fact the pydra-ml library that we built doesn't have any command line tools for interfaces.

we have even considered how to simplify the user specification, but have repeatedly come back to the notion that providing a bit more syntactic sugar doesn't help significantly beyond the current construction possibilities of specs.

personally, i am always open to reconsidering options, but in this case given all the other aspects of pydra we have, we should really consider what we loose by switching back to dataclasses.

ghisvail commented 1 year ago

Hey @satra, thanks for taking the time to provide some more context to this 👍

@ghisvail - this is a path we took after careful consideration of all the issues we ran into with dataclasses. we realized we were building attrs when working with dataclasses. having two different object types makes it a lot more complicated, even if it's hidden away internally.

Just to be sure, I am not suggesting to replace attrs by dataclasses everywhere within pydra, but having at least simpler use cases which are fully expressible with dataclasses (like the example) not require attrs.

hence, i would suggest going down this road only when the following aspects have been considered/demonstrated for dataclasses:

* validation

* dynamic inheritance

* pickling of dynamic classes

Those are for Pydra internals, no? I am only talking about the surface level API here.

pydra is not just for CLI tools. we wanted pydra to be flexible enough to handle python functions, also in other libraries, as well and to be able to use them in workflows. in fact the pydra-ml library that we built doesn't have any command line tools for interfaces.

Indeed. Though I see no class-based definitions of specifications in pydra-ml, only decorator-style task definitions. I am not proposing to touch those, they are totally fine and don't expose attrs.

we have even considered how to simplify the user specification, but have repeatedly come back to the notion that providing a bit more syntactic sugar doesn't help significantly beyond the current construction possibilities of specs.

I personnaly find most of the UX for task definition, be it via pydra.mark decorators or pydra.specs.SpecInfo, really good. None of it exposes nor requires attrs specific knowledge, which is a sign of good encapsulation. I just wish static style definitions would not leak attrs as a requirement.

personally, i am always open to reconsidering options, but in this case given all the other aspects of pydra we have, we should really consider what we loose by switching back to dataclasses.

Once again, I am not proposing to switch Pydra's internals from attrs to dataclasses, only suggesting to also allow usage of dataclasses for simple static task definitions.

It might not be feasible for reasons I have overlooked, hence my motivation for opening this issue. Perhaps it's a stupid idea actually 😅