greta-dev / greta

simple and scalable statistical modelling in R
https://greta-stats.org
Other
527 stars 63 forks source link

Detect and "awaken" dead pointers so greta can work with targets #595

Open njtierney opened 10 months ago

njtierney commented 10 months ago

Greta currently doesn't play well with targets because it uses uses reticulate to talk to Python, and when the R session ends the connection to Python objects is lost (they become NULL externalptr objects). Some details here and full discussion with @goldingn and @smwindecker here.

As @goldingn said:

The pointers to python/TF objects are only created when model() is called on them, and sometimes again when mcmc() and other inference methods are run. When future is used to run models in parallel, there's a need to redefine models and pointers in the same way as described here: https://github.com/greta-dev/greta/blob/tf2-poke-tf-fun/R/inference_class.R#L346-L354

The issue comes down to redefining the pointers in a new environment, which is something that is done when greta works in parallel, at the link just above.

Golding later says:

instead of explicitly recreating those pointers when in parallel, why not add a step to check for dead pointers and redefine them, whenever any of those TF objects are accessed? The we can delete the parallel-specific code, and this should work in targets.

This has the benefit of greta natively working in parallel, and natively working with targets. It would be a huge win for the user experience.

Paraphrasing @goldingn, the goal is:

Whenever greta needs to evaluate one of these pointers (e.g., a model, or mcmc object, something that initialises a python/TF object), we should add a wrapper function which does the following:

  1. Checks if the object exists, and if it does that it isn't a dead pointer
  2. If the pointer isn't there, execute the code to create it
  3. Check the pointer now exists, and error if not.

So for the log prob function, before it's called we'd call something like: self$maybe_define_tf_log_prob_function() before doing: self$tf_log_prob_function(parameters)

Or even better, wrap those into another function: self$evaluate_log_prob_function(parameters), which does both of those steps.

We might need the same pattern for the handful of other tf pointers that are only defined once then reused (maybe the internal MCMC draws object when using extra_samples()?). But we could also delete code in other places that defines these pointers

Overall this would be a really nice win for greta and greta users, but will require some light tinkering with the internals.

For those who want to see an example of targets using greta with a workaround this solution, see Saras's example here: https://github.com/idem-lab/targets-pkg-greta/blob/main/_targets.R.

njtierney commented 1 month ago

Some other notes: https://github.com/idem-lab/example-greta-targets?tab=readme-ov-file#what-you-need-to-add-to-make-greta-work-with-targets