integrated-inferences / CausalQueries

Bayesian inference from binary causal models
Other
24 stars 7 forks source link

memory issue with new summary approach #364

Open macartan opened 1 month ago

macartan commented 1 month ago

the new summary approach generates a lot of large objects, like the parameter matrix and the ambiguity matrix; on the fly. other code avoids the generation of these, making them on a need to know basis

this will be unmanageable with larger models:

> model <- make_model("A -> Y <- B; C-> Y", add_causal_types = FALSE)
> summary <- summary(model)
> object.size(model)
82256 bytes
> object.size(summary)
79637008 bytes

it seems that if a summary is called for then everything gets piled into this objects including all posterior distributions, stan objects and so on

can we revert to generating these only when they are explicitly requested?

gerasy1987 commented 1 month ago

@macartan, thanks for pointing this out. I can work on this next week. Do you have a preference for what objects should be in the summary by default beyond what is in the causal_model it is called on?

macartan commented 1 month ago

Thanks Gosha

My instinct would be to keep the minimum as default

There are a lot of big objects Ambiguities, parameter matrix, the distributions

I imagine two approaches

  1. Have a small summary and shift code to the grab.inspect function. Feels like going backeards a little
  2. Have an include argument in summary that gets passed to print? So summary o my includes extra objects as needed

On Fri 11. Oct 2024 at 18:31, Gosha Syunyaev @.***> wrote:

@macartan https://github.com/macartan, thanks for pointing this out. I can work on this next week. Do you have a preference for what objects should be in the summary by default beyond what is in the causal_model it is called on?

— Reply to this email directly, view it on GitHub https://github.com/integrated-inferences/CausalQueries/issues/364#issuecomment-2407756383, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADBE57N53GZ2QJ22I6UGB3LZ274OFAVCNFSM6AAAAABPYPK3COVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMBXG42TMMZYGM . You are receiving this because you were mentioned.Message ID: @.***>

gerasy1987 commented 1 month ago

@macartan proposed fix is in #366 and ready for review

gerasy1987 commented 1 month ago

addressed by #366

macartan commented 2 weeks ago

Sorry to reopen -- we cannot run all the code in the paper because of the memory requirements of the summary approach

we want to to do this:

make_model("A -> E <- B; C-> E <- D", add_causal_types = FALSE) |>
  grab("parameters") |> 
  length()

but grabbing requires creating causal types and other very large objects

seems we are back to this issue of why generate all these things on the fly when they are not needed or saved

note this is still fast:

make_model("A -> E <- B; C-> E <- D", add_causal_types = FALSE) |>
  CausalQueries:::get_parameters() |>
  length()

grab and inspect really target particular objects and seems wise to create the objects targeted and nothing else.