liesel-devs / liesel

A probabilistic programming framework
https://liesel-project.org
MIT License
38 stars 2 forks source link

Try to update calc on init #92

Closed jobrachem closed 7 months ago

jobrachem commented 1 year ago

This is my proposal for #79

In this PR, a calculator will try to update its value during initialization. If the update fails, a warning with the title of the error is logged. Also, the full traceback of the error is logged to the debug logger.

Example 1:

>>> import liesel.model as lsl
>>> 
>>> a = lsl.Data(1.0)
>>> b = lsl.Calc(lambda x: x / 0, a)
liesel.model.nodes - WARNING - Calc(name="") was not updated during initialization, because the following exception occured: RuntimeError('Error while updating Calc(name="").'). See debug log for the full traceback.

Example 2:

>>> import logging
>>> import liesel.model as lsl
>>> 
>>> logger = logging.getLogger("liesel")
>>> logger.handlers[0].setLevel(logging.DEBUG)
>>> 
>>> a = lsl.Data(1.0)
>>> b = lsl.Calc(lambda x: x / 0, a)
liesel.model.nodes - WARNING - Calc(name="") was not updated during initialization, because the following exception occured: RuntimeError('Error while updating Calc(name="").'). See debug log for the full traceback.
liesel.model.nodes - DEBUG - Calc(name="") was not updated during initialization, because the following exception occured:
Traceback (most recent call last):
  File "/Users/johannesbrachem/Documents/git/liesel/liesel/model/nodes.py", line 526, in update
    self._value = self.function(*args, **kwargs)
  File "<string>", line 1, in <lambda>
ZeroDivisionError: float division by zero

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/johannesbrachem/Documents/git/liesel/liesel/model/nodes.py", line 500, in __init__
    self.update()
  File "/Users/johannesbrachem/Documents/git/liesel/liesel/model/nodes.py", line 528, in update
    raise RuntimeError(f"Error while updating {self}.") from e
RuntimeError: Error while updating Calc(name="").
jobrachem commented 1 year ago

@wiep this is something we discussed briefly during the weekly on August 30. You had some reservations about the general idea to try to update a calculator on initialization - so to enrich the conversation about this proposal, I created this PR which shows how I think the update could be implemented without breaking the calculator's ability to be initialized without a working update. What do you think?

jobrachem commented 12 months ago

Converted back to draft, because I just saw that with this implementation, we get the following warnings during model initialization:

liesel.model.nodes - WARNING - Calc(name="_model_log_prior") was not updated during initialization, because the following exception occured: RuntimeError('Error while updating Calc(name="_model_log_prior").'). See debug log for the full traceback.
liesel.model.nodes - WARNING - Calc(name="_model_log_prob") was not updated during initialization, because the following exception occured: RuntimeError('Error while updating Calc(name="_model_log_prob").'). See debug log for the full traceback.

This is not pretty, because it is just intended behavior and I think probably users should not be warned about this. Before this PR is ready for review, I would like to come up with a nicer solution for this case.

wiep commented 12 months ago

i'm still not super happy with the default warnings. so far, we say liesel you can build a partial graph but that would trigger warnings what is not really what should happen. usually i do not like this magic, but here it might be justified: a context manager that turns the updates on (or off). i think that is better than fiddling with the log levels.

with lsl.instant_evaluation():
     a = lsl.Data(1.0)
     b = lsl.Calc(lambda x: x / 0, a)
# warnings
a = lsl.Data(1.0)
b = lsl.Calc(lambda x: x / 0, a)
# no warnings
jobrachem commented 12 months ago

I don't really think the magic context manager would bring us a lot of joy here. Manually, the desired functionality can already be achieved by

a = lsl.Data(1.0)
b = lsl.Calc(lambda x: x / 0, a).update()

This makes it more obvious what is happening than the context manager.

The idea of this PR is mostly to make it easier for users to spot errors in their code early without introducing additional manual work. If you don't know the internals of how, when and why exactly a calculator gets updated, it is currently easy to get tripped up. At least that is what happened to me.

I think an alternative to the work this PR could lie in documentation.

i think that is better than fiddling with the log levels.

You only need to change the log levels if you have no other way of accessing the original traceback. When you are working interactively, say in quarto, you can just run b.update() manually one time to get the error message. If you can't do that, activating the "debug" log-level for debugging is reasonable I think.

What is true is that this code looks a little fiddly:

logger = logging.getLogger("liesel")
logger.handlers[0].setLevel(logging.DEBUG)

This is a result of how we set up our logging. I think we could (and should) change the setup such that you can set the log level like this:

logger = logging.getLogger("liesel")
logger.setLevel(logging.DEBUG)

But that is a different issue.

wiep commented 11 months ago

Building a non-initialized model is possible without warnings is imho a crucial aspect of Liesel. However, the proposed implementation may require users to turn off logging, build the non-initialized model, and then turn the logging back on. I find that not ideal. Checking the calculations during initialization is often beneficial from a user's perspective, though. Therefore, I think, it may be necessary to have two different modes to cater to both scenarios.

One possible solution is to enable updates by default and disable them only when a particular setting is altered. For example, a ctx manager could be used to adjust the settings temporarily.

lsl.update_on_construction = False
# build model

# or

with lsl.disable_updates_on_construction():
    # build model

We can discuss this further during one of our upcoming weekly meetings.

jobrachem commented 10 months ago

We could also simply add an init parameter to calculators:

a = lsl.Data(1.0)
b = lsl.Calc(lambda x: x / 0, a, update_on_init=False)

This could be set to True by default.

wiep commented 10 months ago

do you think that would be more straightforward to the user than a context manager?

jobrachem commented 10 months ago

Yep, I think it would be both easier to use and also easier to find, because it would be right there in the api documentation.

wiep commented 10 months ago

i like this definitely more than the logging. compared to a context manager it's just very cumbersome if you have many nodes that require no initialization.

jobrachem commented 10 months ago

Fair point. Why not both, the init argument for everyday use and the context manager as an option for bulk operations by advanced users? Or, another option: an alternative constructor like Calc.lazy()?

wiep commented 10 months ago

Also, a good idea. However, I would always include the keyword new to indicate that the static method is a constructor Calc.new_lazy(...).

So far, we do not use ctx manager anywhere because when using them it might be less obvious how some things work.

Maybe we should have an implementation to try things out.

Signature:

Calc(..., update_value: Optional[bool] = None)

Meaning:

jobrachem commented 10 months ago

Quite the elaborate setup, but I guess this can make us both happy :)

wiep commented 10 months ago

we do not keep all of it, was more like an signature for an implementation to try things out. if that's too much work, we can also implement only one version and see if it has short comings.

jobrachem commented 10 months ago

I guess at least the combination of context manager and init argument would make sense, because then we have both use cases covered, i.e. a) general default of update on init with b) the option to deactivate the update for a bunch of variables at a time.

jobrachem commented 7 months ago

We will go with the init argument, which defaults to True. We can add a context manager later, if it turns out we want one.

jobrachem commented 7 months ago

@wiep if you got a notification for this PR, you can ignore it. @GianmarcoCallegher is doing the review.

GianmarcoCallegher commented 7 months ago

LGTM