Open tclose opened 7 months ago
I don't think we should create full-on tasks that can be cached. If these operations are non-trivial, people can write tasks.
If you want to do this, I would start with an explicit model before making it implicit. I think we could provide a similar lazy interface as a LazyField that is a partial application of a function on LazyFields and/or values.
class LazyValue(Protocol):
...
class LazyField(LazyValue):
...
class Partial(LazyValue):
def __init__(self, func, *args, **kwargs):
self.func, self.args, self.kwargs = func, args, kwargs
def apply(self):
# Turn any LazyValues into values
args, kwargs = resolve(self.args, self.kwargs)
return self.func(*args, **kwargs)
...
The above examples would become:
import operator as op
output_prefix = Partial(op.add, wf.lzin.subject_name, "_")
json_file = Partial(op.getattr, wf.lzin.input_niftix, "json_file")
You could start adding to LazyField
with something like:
class LazyField:
def __add__(self, value):
return Partial(op.add, self, value)
def __getattr__(self, key):
return Partial(op.getattr, self, key)
But where do you stop and why? Also, that __getattr__
makes me pretty nervous. And I can just see a linter-driven "cleanup" converting example one to output_prefix = f"{wf.lzin.subject_name}_"
and nobody catching it because __str__
is always valid to call, and get output_prefix = "<LazyField at 0xABad1dea>_"
.
IMO explicit is better than implicit, here, and I think a clear model of these lightweight linking ops could be good. We could even have helpers like:
from functools import partial
Add = partial(Partial, op.add)
GetAttr = partial(Partial, op.getattr)
Yes, I was thinking that it wouldn't need to be a full blown task after I posted this. I like your Partial suggestion.
I was thinking to restrict __getattr__
calls to attributes/properties of the lazy-field type so they can be checked at construction time. Likewise we could only allow operators for field types that implement __add__
, __mul__
, ....
IMO explicit is better than implicit, here, and I think a clear model of these lightweight linking ops could be good. We could even have helpers like:
from functools import partial Add = partial(Partial, op.add) GetAttr = partial(Partial, op.getattr)
Not sure I get this, what is the explicit case in this example, using partial
instead of an operator? One of the benefits as I see it is to align with user's expectations, and I have seen multiple new users expect to be able to do some basic operations on lazy fields
I'm not sure whether allowing __getattr__
on methods is a good idea or not. I can see a use case for some string manipulation methods, e.g.
wf.add(
ATask(
id=wf.lzin.a_field.replace('-', '_')
...
but agree that it will probably open a can of worms
GetAttr = partial(Partial, op.getattr)
Not sure I get this, what is the explicit case in this example, using
partial
instead of an operator? One of the benefits as I see it is to align with user's expectations, and I have seen multiple new users expect to be able to do some basic operations on lazy fields
This just shortens the boilerplate. If you're going to use this feature a lot, you might not want to use Partial(op.XYZ, ...)
all the time.
output_prefix = Add(wf.lzin.subject_name, "_")
json_file = GetAttr(wf.lzin.input_niftix, "json_file")
Incidentally, I now think I'd prefer "Delayed" or something else besides "Partial", since we are waiting for arguments to be ready, not waiting for additional arguments.
I'm not sure whether allowing
__getattr__
on methods is a good idea or not. I can see a use case for some string manipulation methods, e.g.wf.add( ATask( id=wf.lzin.a_field.replace('-', '_') ...
I wonder if you could actually do this 100% by hacking __getattr__
and __call__
, and let __add__
and replace
come for free. You might end up with Delayed(Delayed(op.getattr, wf.lzin.a_field, replace), '-', '_')
.
but agree that it will probably open a can of worms
What if instead of doing it implicitly, we at least required people to acknowledge that they're not doing a normal expression: Magic(wf.lzin.a_field).replace('-', '_')
. (Could do something like Connect
or Delay
.)
what about wf.lzin.a_field.delayed_value.replace('-', '_')
?
IMO that's harder to parse and understand where the magic is happening. If you'll accept extra syntax, I don't see why a weird attribute would be better than a weird function call.
Fair enough, would be happy with Delay
I think.
This discussion has made me think that calling them "lazy" fields might be slight misnomer, as I would typically expect lazy fields to be populated when I access them, not at some specified later time. Not sure what I would have called them though
What would you like changed/added and why?
When connecting a lazy-out field to a downstream node, there are often minor operations to be applied to the field value, e.g.
output_prefix = wf.lzin.subject_name + "_"
orjson_file=wf.lzin.input_niftix.json_file
(see #723). When learning how to write Pydra workflows, users often intuitively expect that such operations should work and instead of needing to create custom function task.I would like Pydra in such cases to automatically create a function task to perform the operation and insert it into the workflow transparently to the user.
What would be the benefit? Does the change make something easier to use?
The changes would be more