Open ghisvail opened 1 year ago
Keep in mind that I might be missing some relevant context. I am reporting this issue from the viewpoint of a task contributor and a novice developer to the project. I know quite a few project which made the jump to attrs
(for instance pytest
lately if I am not mistaken), yet the surface-level API does not leak any attrs
specific features or limitations to my knowledge.
@ghisvail - this is a path we took after careful consideration of all the issues we ran into with dataclasses. we realized we were building attrs when working with dataclasses. having two different object types makes it a lot more complicated, even if it's hidden away internally.
hence, i would suggest going down this road only when the following aspects have been considered/demonstrated for dataclasses:
pydra is not just for CLI tools. we wanted pydra to be flexible enough to handle python functions, also in other libraries, as well and to be able to use them in workflows. in fact the pydra-ml library that we built doesn't have any command line tools for interfaces.
we have even considered how to simplify the user specification, but have repeatedly come back to the notion that providing a bit more syntactic sugar doesn't help significantly beyond the current construction possibilities of specs.
personally, i am always open to reconsidering options, but in this case given all the other aspects of pydra we have, we should really consider what we loose by switching back to dataclasses.
Hey @satra, thanks for taking the time to provide some more context to this 👍
@ghisvail - this is a path we took after careful consideration of all the issues we ran into with dataclasses. we realized we were building attrs when working with dataclasses. having two different object types makes it a lot more complicated, even if it's hidden away internally.
Just to be sure, I am not suggesting to replace attrs
by dataclasses
everywhere within pydra
, but having at least simpler use cases which are fully expressible with dataclasses
(like the example) not require attrs
.
hence, i would suggest going down this road only when the following aspects have been considered/demonstrated for dataclasses:
* validation * dynamic inheritance * pickling of dynamic classes
Those are for Pydra internals, no? I am only talking about the surface level API here.
pydra is not just for CLI tools. we wanted pydra to be flexible enough to handle python functions, also in other libraries, as well and to be able to use them in workflows. in fact the pydra-ml library that we built doesn't have any command line tools for interfaces.
Indeed. Though I see no class-based definitions of specifications in pydra-ml
, only decorator-style task definitions. I am not proposing to touch those, they are totally fine and don't expose attrs
.
we have even considered how to simplify the user specification, but have repeatedly come back to the notion that providing a bit more syntactic sugar doesn't help significantly beyond the current construction possibilities of specs.
I personnaly find most of the UX for task definition, be it via pydra.mark
decorators or pydra.specs.SpecInfo
, really good. None of it exposes nor requires attrs
specific knowledge, which is a sign of good encapsulation. I just wish static style definitions would not leak attrs
as a requirement.
personally, i am always open to reconsidering options, but in this case given all the other aspects of pydra we have, we should really consider what we loose by switching back to dataclasses.
Once again, I am not proposing to switch Pydra's internals from attrs
to dataclasses
, only suggesting to also allow usage of dataclasses
for simple static task definitions.
It might not be feasible for reasons I have overlooked, hence my motivation for opening this issue. Perhaps it's a stupid idea actually 😅
I am a big advocate for class-based definition of input and output task specifications over dynamic constructs via
pydra.specs.SpecInfo(..., fields=[...])
.One annoying thing though is that class-based specifications aren't working with built-in
dataclasses
although they could from a pure semantic viewpoint. I understand the rationales (mostly pickling) for the transition at the implementation level, but I was wondering whether we could keep support fordataclasses
at the surface API level.For instance, let's take the following interface for
fslreorder2std
:There really is nothing specific to
attrs
which is not covered bydataclasses
:As you can see the semantics are equivalent (not suprising considering their historical synergies), yet
dataclasses
has the benefits of being built-in and having its documentation right there with the rest of Python. Perhaps by transforming the dataclasses-based declaration into anattrs
representation internally?