scipp / sciline

Build scientific pipelines for your data
https://scipp.github.io/sciline/
BSD 3-Clause "New" or "Revised" License
10 stars 2 forks source link

Support adding decorators for providers #123

Open SimonHeybrock opened 9 months ago

SimonHeybrock commented 9 months ago

Provider functions and parameters in a pipeline have recently been wrapped in the Provider class. This makes it very easy to add decorators that wrap all calls (you can try with just a few lines of changes). This is useful for a variety of purposes:

I don't feel any of the above should necessarily be supported by Sciline itself. Instead, I suggest to add a method to sciline.Pipeline that allow adding decorators. These would be used to decorate https://github.com/scipp/sciline/blob/95ef23e93b561d476de87b129f9d0092ad83adfc/src/sciline/_provider.py#L104-L107, allowing the user to add one more of the above.

I suggest making this a method of Pipeline, something like

pipeline.set_decorators((my_logger, print_timings))
jl-wynen commented 9 months ago

Why does this need support from sciline? Can't you just put a decorator on the function itself?

SimonHeybrock commented 9 months ago

This is about decorating all providers, not just individual ones. I suppose one can achieve something similar, decorating all functions (but not param access), but it needs modification in more places? Also, it requires that the decorators all use functools.wraps, otherwise the type hints are not in place for Sciline to do its job.

providers = mylib.providers
providers = tuple(mydecorator(func) for func in providers)  # breaks unless decorates uses functools.wraps
pipeline = sciline.Pipeline(providers)
pipeline.insert(mydecorator(func1))
pipeline.insert(mydecorator(func2))

as opposed to

providers = mylib.providers
pipeline = sciline.Pipeline(providers)
pipeline.insert(func1)
pipeline.insert(func2)
pipeline.decorate(mydecorator)  # works with any decorator

which changes only a single line and also decorates calls to non-function providers.

jl-wynen commented 9 months ago

Also, it requires that the decorators all use functools.wraps, otherwise the type hints are not in place for Sciline to do its job.

Sure, that's just good decorator hygiene and not really our concern.

Applying a decorator globally is a bit hard to manage for anything nontrivial. But I guess it could be useful. Does it make more sense to think of it as hooks instead of decorators? You don't so much produce new functions or classes but want to run some code when a certain event occurs (provider called). This would remove the option to modify the providers. But that seems like a good thing. Having someone rewrite a pipeline just by adding a decorator to anything makes the code much harder to reason about.

SimonHeybrock commented 9 months ago

Yes, I suppose hooks might be a better description.

This would remove the option to modify the providers. But that seems like a good thing. Having someone rewrite a pipeline just by adding a decorator to anything makes the code much harder to reason about.

Not sure I understand. How and why would you rewrite a pipeline with a decorator?

jl-wynen commented 9 months ago

Decorators can change the function you apply them to. So they can replace it with a different one. When you apply them to all providers, they can potentially change the pipeline into something completely different. It would look the same as before as far as the graph is concerned but do a different computation. Or course, that is just bad practice. But a smaller scale version of that could happen where the behaviour of one or more providers is changed. Again, bad practice. But we would invite that kind of usage by using decorators.

Concerning hooks, there are typically three types: before, after, around. Some use case can be done with before or after hooks. But timing and potentially debugging might require around hooks. We have to be careful not to turn those into decorators. I.e.

def around_hook(provider):
  def impl(*args, **kwargs):
      # do something
      res = provider(*args, **kwargs)
      # to something else
      return res
  return impl

would be like a decorator. But

def around_hook(provider, *args, **kwargs):
    # do something
    res = yield  # runs the provider
    # do something else

would prevent the user from changing the provider. It would also prevent them from changing the arguments and return value which may or may not be what we want.

SimonHeybrock commented 9 months ago

I am not sure I share your concerns. If people want to break something they always can. If you can set a decorator you can also change the pipeline in other ways. If you use bad practices, things will break. There is nothing we can do about that, and pinning down an interface against that is hard and often counter productive. What is needed is preventing accidental misuse.

Not sure about your around_hook. It cannot access the inputs or outputs of the provider, right? But that may be essential for logging, how would you intend to implement that?

jl-wynen commented 9 months ago

Not sure about your around_hook. It cannot access the inputs or outputs of the provider, right? But that may be essential for logging, how would you intend to implement that?

It can access them (args, kwargs, res). But it cannot modify them.

I agree with your general argument. But the normal use case for decorators is modifying functions. This can be a simple addition that doesn't change the business logic like logging. But it can do much more. Essentially, it transforms one function into another. And at least for me, that is my mental model of decorators. So I would see applying decorators to a pipeline as transforming that pipeline. So my concern is not with making it possible to modify the pipeline, but that we tell users 'Here is a tool to modify providers, knock yourselves out!'. And maybe in the docs 'By the way, don't misuse this!'

This is not necessarily bad. But how do we ensure that visualize and serialize are still meaningful? When the decorator uses functools.wraps, the transformed pipeline looks exactly like the original.

SimonHeybrock commented 9 months ago

I suppose we could

Not sure about your around_hook. It cannot access the inputs or outputs of the provider, right? But that may be essential for logging, how would you intend to implement that?

It can access them (args, kwargs, res). But it cannot modify them.

Do you mean it cannot modify the list or the dict, but only the values? And I suppose res can always be modified, unless it is an immutable type?

Would you provide a Hook base class, where to user only implement some methods such as before() and after()?

jl-wynen commented 9 months ago

Do you mean it cannot modify the list or the dict, but only the values? And I suppose res can always be modified, unless it is an immutable type?

True, it's Python, so you can modify the contents of most objects.

Would you provide a Hook base class, where to user only implement some methods such as before() and after()?

No, I'd just use functions with signatures

def before_hook(provider: Provider, <args-and-kwargs-of-provider>) -> None:
def after_hook(provider: Provider, res: <Return-type-of-provider>) -> None:
def around_hook(provider: Provider, <args-and-kwargs-of-provider>) -> Generator[None, <Return-type-of-provider>, None]:

and register those with dedicated methods: Pipeline.add_{before,after,around}_hook

Before and after are straight forward functions, but around need to be a generator like this:

def around_provider(provider, args, kwargs) -> Generator[None, R, None]:
    # before provider is called
    res: R = yield
    # after provider is called

# When calling the provider:
gen = around_provider(provider, args, kwargs)
next(gen)
res = provider.call()
try:
    gen.send(res)
except StopIteration:
    pass

Of course, I'm not claiming that this is the best way to do it. Can all your use cases be implemented with this?

SimonHeybrock commented 9 months ago

This looks quite complicated!

jl-wynen commented 9 months ago

Why should hooks know about the providers, a user might just want to use some existing code (e.g., for timing)?

They need to know which provider they are hooking into. Otherwise, all timing providers could only report, 'a provider ran in time x'.

Users have to learn a special syntax for the around_hooks, rather than just writing a simple decorator, which they may have done many times.

Yes, the syntax is uncommon. But apart from getting a result from yield, it is similar to pytest fixtures. And getting a decorator right is also nontrivial. Plus, the majority of the complication is in sciline itself.

SimonHeybrock commented 9 months ago

Why should hooks know about the providers, a user might just want to use some existing code (e.g., for timing)?

They need to know which provider they are hooking into. Otherwise, all timing providers could only report, 'a provider ran in time x'.

You can print the name of the function, that's that I did. But yes, having the provider may be useful.