Closed gerasy1987 closed 2 months ago
Hi @gerasy1987 this makes a lot of sense. Having objects that are infact data.frames inherit from that class was basically a convenience to just leverage their print method. Perhaps we can define methods at the causal_model
class level to kill two birds with one stone (since many of the attached objects don't strictly speaking need their own class) --> the summary & print methods for causal_model
already address this to some degree, as they show condensed output for many of those classes
If we want to keep methods around at the object level we can define super classes with the structure @macartan suggested:
model$model_structure
contains the "statement" "dag" "nodes" "parents_df" "nodal_types", "parameters_matrix"
model$model_parameters
contains the "parameters_df", "prior_distribution", "posterior_distribution"
model$stan_objects
contains stan objects
also parmap would go into structure; and we might get rid of dag
On Mon, Aug 5, 2024 at 1:26 PM Till Tietz @.***> wrote:
If we want to keep methods around at the object level we can define super classes with the structure @macartan https://github.com/macartan suggested:
model$model_structure contains the "statement" "dag" "nodes" "parents_df" "nodal_types", "parameters_matrix"
model$model_parameters contains the "parameters_df", "prior_distribution", "posterior_distribution"
model$stan_objects contains stan objects
— Reply to this email directly, view it on GitHub https://github.com/integrated-inferences/CausalQueries/issues/346#issuecomment-2268841722, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADBE57KZRMZUMS6J4FWSW7DZP5OMTAVCNFSM6AAAAABL7NHQROVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRYHA2DCNZSGI . You are receiving this because you were mentioned.Message ID: @.***>
@macartan just about to push the fix replacing dag with parent_list. Do we want to change statement to dag and attach a plot method to that?
my instinct is not to rename for backwards compatibility
if we add a plot method I think it should be using the same function, plot_model; plot_model would then have to figure out if it is receiving a model or a statement; something ilke:
if(!any(c("statement", "causal_model") %in% class(x))
stop("first argument should be a causal model or a causal statement)
statement <- ifelse("model" %in% class(x)), grab(x, "causal_statement"), x)
the function would go through OK with a statement, except it wouldn;t have access to nodes; however that's only relevant when you are using the function and providing coordinates; so one could move nodes <- model$nodes into the if:
# nodes <- model$nodes
dagitty_statement <- paste("dag{", model$statement, "}") %>%
dagitty::dagitty()
if (!is.null(x_coord) & !is.null(y_coord)) {
names(y_coord) <- names(x_coord) <- grab(model, "nodes")
dagitty::coordinates(dagitty_statement) <- list(x = x_coord,
y = y_coord) %>%
coords2df() %>% coords2list()
}
On Mon, Aug 5, 2024 at 2:10 PM Till Tietz @.***> wrote:
@macartan https://github.com/macartan just about to push the fix replacing dag with parent_list. Do we want to change statement to dag and attach a plot method to that?
— Reply to this email directly, view it on GitHub https://github.com/integrated-inferences/CausalQueries/issues/346#issuecomment-2268921050, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADBE57LDTXSDWGSGDV5SVLTZP5TSPAVCNFSM6AAAAABL7NHQROVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRYHEZDCMBVGA . You are receiving this because you were mentioned.Message ID: @.***>
maybe better: in case people would ever want to use a statement and x_coordinates, y_coordinates: lets make nodes - NULL an argument and require that if x_coordinates, y_coordinates, are provided that we either have:
x is a causal_model nodes is not NULL or x_coordinates, y_coordinates are named
I actually I had a use case for this where I wanted to use plot_model but the model was so large it couldn't be created easily; this way we can bypass model creation and just do:
plot_model("X->Y")
I pushed a version in branch "plot"; have a look this could then be used as a plotting method for a model or for a statement if that is provided as the first argument
(slightly awkwardly the first argument is called model; this makes it backwards compatible and consistent with other core functions; but it means that you can pass a dagitty statement to this argument)
On Mon, Aug 5, 2024 at 3:40 PM WZB Institutions and Political Inequality Group @.***> wrote:
maybe better: in case people would ever want to use a statement and x_coordinates, y_coordinates: lets make nodes - NULL an argument and require that if x_coordinates, y_coordinates, are provided that we either have:
x is a causal_model nodes is not NULL or x_coordinates, y_coordinates are named
I actually I had a use case for this where I wanted to use plot_model but the model was so large it couldn't be created easily; this way we can bypass model creation and just do:
plot_model("X->Y")
— Reply to this email directly, view it on GitHub https://github.com/integrated-inferences/CausalQueries/issues/346#issuecomment-2269107097, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADBE57MPVRVE55ISY6PPEDLZP56EJAVCNFSM6AAAAABL7NHQROVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRZGEYDOMBZG4 . You are receiving this because you were mentioned.Message ID: @.***>
Amazing; thank you @macartan ... will review the branch and merge + squash both belonging to the dag issue
If we want to keep methods around at the object level we can define super classes with the structure @macartan suggested:
model$model_structure contains the "statement" "dag" "nodes" "parents_df" "nodal_types", "parameters_matrix" model$model_parameters contains the "parameters_df", "prior_distribution", "posterior_distribution" model$stan_objects contains stan objects
@till-tietz , so the idea would be that each of those large list objects in the model will have a special class instead of each individual object? or am I missing something here. Regardless happy to help with implementation of some of this
@gerasy1987 that was the idea. Trying to think through if we had a complaint in the earlier reviewer comments that the internal objects had no methods --> if that's not the case perhaps we should try to implement as many S3 methods as possible at the 'causal_model' class level
I'm not sure I understand the plan exactly
I do like that right now when you grab(model, X) it always has a print method for the object you have accessed and so not sure how to reduce all these objects?
I think we can push the print functionality to grab directly, and avoid having a ton of new classes that do not have properly defined S3 methods besides print
Ok thanks!
On Mon 26. Aug 2024 at 17:56, Georgiy (Gosha) Syunyaev < @.***> wrote:
I think we can push the print functionality to grab directly, and avoid having a ton of new classes that do not have properly defined S3 methods besides print
— Reply to this email directly, view it on GitHub https://github.com/integrated-inferences/CausalQueries/issues/346#issuecomment-2310540287, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADBE57MAX6L45XHCFKMLHPLZTNF3PAVCNFSM6AAAAABL7NHQROVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMJQGU2DAMRYG4 . You are receiving this because you were mentioned.Message ID: @.***>
@macartan @till-tietz I have been working on reducing the number of S3 classes we have. One thing I realized is that nested structure mentioned in
If we want to keep methods around at the object level we can define super classes with the structure @macartan suggested:
model$model_structure contains the "statement" "dag" "nodes" "parents_df" "nodal_types", "parameters_matrix" model$model_parameters contains the "parameters_df", "prior_distribution", "posterior_distribution" model$stan_objects contains stan objects
Will entail quite a lot of rewrite of other functions as we refer to internal objects of causal_model objects throughout the package. It's doable but will be prone to errors without any direct benefit. Instead, I think what we can get away with is keeping all those objects as elements of causal_model object without nesting and just removing the declaration of classes for them, keeping them declared as a secondary class, e.g., class(model$statement)
will be "character"
and class(model$dag)
will be "data.frame"
.
This is not an uncommon way of dealing with it. For example, the lm()
function produces an object of class lm
that is a list of many objects, none of which have unique classes assigned to them.
This will mean that most of the print functionality will have to go somewhere, and I think the natural place for that would be a summary
method for causal_model
objects (instead of grab
). For example, summary.causal_model
instead of the current very basic version can take an additional argument, like what=
, which will subset and summarize the object and print nice output. We can keep grab
with similar functionality if you think it would be more explicit.
Let me know if you think this sounds okay, and I will go ahead and finish implementing this (I already started to do this at a local branch).
Sounds good to me
Simple and robust are keywords
I
On Mon 2. Sep 2024 at 17:57, Georgiy (Gosha) Syunyaev < @.***> wrote:
@macartan https://github.com/macartan @till-tietz https://github.com/till-tietz I have been working on reducing the number of S3 classes we have. One thing I realized is that nested structure mentioned in
If we want to keep methods around at the object level we can define super classes with the structure @macartan https://github.com/macartan suggested:
model$model_structure contains the "statement" "dag" "nodes" "parents_df" "nodal_types", "parameters_matrix"
model$model_parameters contains the "parameters_df", "prior_distribution", "posterior_distribution"
model$stan_objects contains stan objects
Will entail quite a lot of rewrite of other functions as we refer to internal objects of causal_model objects throughout the package. It's doable but will be prone to errors without any direct benefit. Instead, I think what we can get away with is keeping all those objects as elements of causal_model object without nesting and just removing the declaration of classes for them, keeping them declared as a secondary class, e.g., class(model$statement) will be "character" and class(model$dag) will be "data.frame".
This is not an uncommon way of dealing with it. For example, the lm() function produces an object of class lm that is a list of many objects, none of which have unique classes assigned to them.
This will mean that most of the print functionality will have to go somewhere, and I think the natural place for that would be a summary method for causal_model objects (instead of grab). For example, summary.causal_model instead of the current very basic version can take an additional argument, like what=, which will subset and summarize the object and print nice output. We can keep grab with similar functionality if you think it would be more explicit.
Let me know if you think this sounds okay, and I will go ahead and finish implementing this (I already started to do this at a local branch).
— Reply to this email directly, view it on GitHub https://github.com/integrated-inferences/CausalQueries/issues/346#issuecomment-2325028239, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADBE57LJV6Y5GSTG2TSKJHDZUSDGXAVCNFSM6AAAAABL7NHQROVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMRVGAZDQMRTHE . You are receiving this because you were mentioned.Message ID: @.***>
@gerasy1987 sounds good to me ... a lot of the 'separate' print functions for those objects are just repeated in the summary method for causal_model anyways. Thanks for thinking through this!!!
@till-tietz @macartan just a note that I am still working on this: Some initial commits are on
methods-optimization branch trying to move all print
functionality into print.summary.causal_model()
and removing (many) unneeded classes. There might be work left with tests after I am done with initial implementation.
@gerasy1987 thanks for pushing this; will review tests etc.
Thanks Gosha ! Say when!
Best
On Tue 10. Sep 2024 at 04:15, Georgiy (Gosha) Syunyaev < @.***> wrote:
@till-tietz https://github.com/till-tietz @macartan https://github.com/macartan just a note that I am still working on this: Some initial commits are on methods-optimization branch trying to move all print functionality into print.summary.causal_model() and removing (many) unneeded classes. There might be work left with tests after I am done with initial implementation.
— Reply to this email directly, view it on GitHub https://github.com/integrated-inferences/CausalQueries/issues/346#issuecomment-2339474454, or unsubscribe https://github.com/notifications/unsubscribe-auth/APIIDCSK6NFIEYFLGZKZMQDZVZI3PAVCNFSM6AAAAABL7NHQROVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMZZGQ3TINBVGQ . You are receiving this because you commented.Message ID: @.***>
Pull request with significant rewrite of methods is in #357 and ready for rewview
Amazing; I merged and I I think I dealt with conflicts; @lilymedina you might want to make sure your documentation edits are still good!
@gerasy1987 some comments below:
Can we: type_distribution -> type_posterior for consistency with type_prior
inspect(model, object = "huh") Length Class Mode 1 character character
Possible instead to have "'huh' is not part of a Causal Model. Please see options for the what
argument in inspect?
inspect(model, what = "statement") Model does not contain posterior_distribution or stan_objects; to include it update model
This seems unnecessary when the correct object is returned; could it be removed
inspect(model, what = "dag") Model does not contain posterior_distribution or stan_objects; to include it update model
Causal statement: X -> Y
Dag:
parent children 1 X Y
inspect(model, what = "type_distribution") Model does not contain posterior_distribution or stan_objects; to include it update model
Causal statement: X -> Y
model <- make_model("X->Y") |> update_model()
a <- inspect(model, what = "stanfit")
this prints the stan summary but returns null
heads up: I removed the dag object from model; this was confusing folks since its not really a dag note that plotting uses make_dag() directly and doesn't need dag I replaced the get_parents function for one that doesn;t need the dag object
@macartan , to your points:
To your question about removed options to inspect()
and also print.summary.causal_model()
causal_types_interpretation
- this used to print the output of interpret_type()
before so I am not sure how this is different from nodal_types
. Am I missing something here?event_probabilities
-- this was changed to prior_event_probabilities
which more explicitly separates it from posterior_event_probabilities
that can also be requested and refers to another object within causal_model
objectparameters
-- this should be in and works for me on the main branch nowstan_summary
-- I am not sure what would be the use of returning the stan_summary
printout itself (which is just a vector of character strings that comprise the summary) for users. Right now the summary is saved in stan_objects
and is printed when stanfit
or stan_objects
are requested via inspect()
. I would say if we want to allow users to retrieve the stan_summary
object it has to be something more meaningful.On point 1 about printout when invalid what =
options are provided, I agree, we need a more transparent message there and a failsafe, I can work on it next week
On points 2 and 3 about excessive messaging when standard objects like statement
are requested, and default printing of statement
I thought this might be useful for the user, especially when using summary()
on the causal_model
object (which I viewed as a function used to summarize the object rather than pull one part of it). I think you are right that it probably does not make sense with inspect
so I can tweak that
On point 4, I agree that we should probably improve the message there to point out the error more clearly. I went with a minimalist approach there where we just print the default message at the top that points out missing stanfit and posterior information. Happy to work on improving this too
On point 5, I think we can just tweak the message there to point out that this is just the summary and no stanfit
was saved. The reason why I am hesitant to keep stan_summary
as an option as I said above is that it is not clear what inspect(model, "stan_summary")
is supposed to return as an object invisibly, since all other options return meaningful objects. Again happy to do that next week
Let me know what you think about these responses and I will work on implementation
This all makes sense to me though I think making it easy to access the Stan summary directly is useful. I realise it’s just strings but the reviewers wanted a method to see the diagnostics and this is a quick way to do this; we demonstrate the use of this in the paper
Thanks for this great work
On Sun 15. Sep 2024 at 03:48, Georgiy (Gosha) Syunyaev < @.***> wrote:
@macartan https://github.com/macartan , to your points:
-
To your question about removed options to inspect() and also print.summary.causal_model()
- causal_types_interpretation - this used to print the output of interpret_type() before so I am not sure how this is different from nodal_types. Am I missing something here?
- event_probabilities -- this was changed to prior_event_probabilities which more explicitly separates it from posterior_event_probabilities that can also be requested and refers to another object within causal_model object
- parameters -- this should be in and works for me on the main branch now
- stan_summary -- I am not sure what would be the use of returning the stan_summary printout itself (which is just a vector of character strings that comprise the summary) for users. Right now the summary is saved in stan_objects and is printed when stanfit or stan_objects are requested via inspect(). I would say if we want to allow users to retrieve the stan_summary object it has to be something more meaningful.
On point 1 about printout when invalid what = options are provided, I agree, we need a more transparent message there and a failsafe, I can work on it next week
On points 2 and 3 about excessive messaging when standard objects like statement are requested, and default printing of statement I thought this might be useful for the user, especially when using summary() on the causal_model object (which I viewed as a function used to summarize the object rather than pull one part of it). I think you are right that it probably does not make sense with inspect so I can tweak that
On point 4, I agree that we should probably improve the message there to point out the error more clearly. I went with a minimalist approach there where we just print the default message at the top that points out missing stanfit and posterior information. Happy to work on improving this too
On point 5, I think we can just tweak the message there to point out that this is just the summary and no stanfit was saved. The reason why I am hesitant to keep stan_summary as an option as I said above is that it is not clear what inspect(model, "stan_summary") is supposed to return as an object invisibly, since all other options return meaningful objects. Again happy to do that next week
Let me know what you think about these responses and I will work on implementation
— Reply to this email directly, view it on GitHub https://github.com/integrated-inferences/CausalQueries/issues/346#issuecomment-2351281128, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADBE57M4BGAY2OKN5CQLCFLZWTRPBAVCNFSM6AAAAABL7NHQROVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNJRGI4DCMJSHA . You are receiving this because you were mentioned.Message ID: @.***>
@macartan updated summary.causal_model()
and inspect()
implementations addressing issues you raised is on methods-optimization branch. I will double check it does not break anything and create a pull request later today.
Amazing I can pill everything together tomorrow
Any open issues ?
@till if curves not ready we can treat as an improvement
The arrows overlapping with nodes bugs me though and I might try to resolve in the morning
On Mon 23. Sep 2024 at 20:17, Gosha Syunyaev @.***> wrote:
@macartan https://github.com/macartan updated summary.causal_model() and inspect() implementations addressing issues you raised is on methods-optimization branch. I will double check it does not break anything and create a pull request later today.
— Reply to this email directly, view it on GitHub https://github.com/integrated-inferences/CausalQueries/issues/346#issuecomment-2369031064, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADBE57JKGZW3QSEFCKISB7LZYBLKTAVCNFSM6AAAAABL7NHQROVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNRZGAZTCMBWGQ . You are receiving this because you were mentioned.Message ID: @.***>
@macartan will have a crack at wrapping up the plotting tonight. Let me know once you feel happy with the integrated changes. We can then run a full set of tests and push this to CRAN as a major release as it has non backwards compatible changes.
sounds good
on plotting: I looked more and ran into a wall again
suggestion: how about using the following from ggdag:
ggdag:::edge_type_switch("link_arc")()
that does the arrows and the curves and I think it does not rely on dagitty;
MWE:
statement <- "X -> M -> Y <-> X" nodes <- c("X", "M", "Y")
dag <- CausalQueries:::make_dag(statement)
coords <- CausalQueries:::position_nodes(dag, nodes) |> dplyr::mutate(x = x - mean(x), y = y - mean(y))
dag <- dag |> dplyr::left_join(coords, by = list(x = "w", y = "node")) |> magrittr::set_colnames(c("name","to","direction","xend","yend"))
dag_plot <- data.frame( name = nodes ) |> dplyr::left_join(coords, by = list(x = "name", y = "node")) |> dplyr::left_join(dag, "name") |> mutate(circular = FALSE)
dag_plot |> ggplot(aes(x = x, y = y, xend = xend, yend = yend)) + geom_point(size = 16) + ggdag:::edge_type_switch("link_arc")()
On Tue, Sep 24, 2024 at 9:47 AM Till Tietz @.***> wrote:
@macartan https://github.com/macartan will have a crack at wrapping up the plotting tonight. Let me know once you feel happy with the integrated changes. We can then run a full set of tests and push this to CRAN as a major release as it has non backwards compatible changes.
— Reply to this email directly, view it on GitHub https://github.com/integrated-inferences/CausalQueries/issues/346#issuecomment-2370452949, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADBE57IGJLBLX4UXHHXXLODZYEKKVAVCNFSM6AAAAABL7NHQROVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNZQGQ2TEOJUHE . You are receiving this because you were mentioned.Message ID: @.***>
after a little more digging; this seems all to be based on functionalityy in ggraph https://www.data-imaginist.com/posts/2017-02-16-ggraph-introduction-edges/
and it might make sense to use that directly
Introduction to ggraph: Edges – Data Imaginist (data-imaginist.com) https://www.data-imaginist.com/posts/2017-02-16-ggraph-introduction-edges/
On Tue, Sep 24, 2024 at 9:52 AM Macartan Humphreys @.***> wrote:
sounds good
on plotting: I looked more and ran into a wall again
suggestion: how about using the following from ggdag:
ggdag:::edge_type_switch("link_arc")()
that does the arrows and the curves and I think it does not rely on dagitty;
MWE:
statement <- "X -> M -> Y <-> X" nodes <- c("X", "M", "Y")
dag <- CausalQueries:::make_dag(statement)
coords <- CausalQueries:::position_nodes(dag, nodes) |> dplyr::mutate(x = x - mean(x), y = y - mean(y))
dag <- dag |> dplyr::left_join(coords, by = list(x = "w", y = "node")) |> magrittr::set_colnames(c("name","to","direction","xend","yend"))
dag_plot <- data.frame( name = nodes ) |> dplyr::left_join(coords, by = list(x = "name", y = "node")) |> dplyr::left_join(dag, "name") |> mutate(circular = FALSE)
dag_plot |> ggplot(aes(x = x, y = y, xend = xend, yend = yend)) + geom_point(size = 16) + ggdag:::edge_type_switch("link_arc")()
On Tue, Sep 24, 2024 at 9:47 AM Till Tietz @.***> wrote:
@macartan https://github.com/macartan will have a crack at wrapping up the plotting tonight. Let me know once you feel happy with the integrated changes. We can then run a full set of tests and push this to CRAN as a major release as it has non backwards compatible changes.
— Reply to this email directly, view it on GitHub < https://github.com/integrated-inferences/CausalQueries/issues/346#issuecomment-2370452949>,
or unsubscribe < https://github.com/notifications/unsubscribe-auth/ADBE57IGJLBLX4UXHHXXLODZYEKKVAVCNFSM6AAAAABL7NHQROVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNZQGQ2TEOJUHE>
. You are receiving this because you were mentioned.Message ID: @.***>
— Reply to this email directly, view it on GitHub https://github.com/integrated-inferences/CausalQueries/issues/346#issuecomment-2370462308, or unsubscribe https://github.com/notifications/unsubscribe-auth/APIIDCTETWK2PZ5CAGWOTETZYEK3PAVCNFSM6AAAAABL7NHQROVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNZQGQ3DEMZQHA . You are receiving this because you commented.Message ID: @.***>
This might have been discussed before (in which case feel free to close this issue), but I have a hunch on what the reviewer of the package did not like about internal objects. It is indeed the case that objects of class
causal_model
have a lot of internal objects that have their own classes. E.g.:Most of the new classes indeed only have
print
methods defined for them:This means calling
summary
orplot
on most of these objects will defer to the second class of the object, i.e.,data.frame
, and will produce a non-standard error. E.g., see the difference in error outputs between the following two:and similarly, the outputs of the following two lines differ, but also none of them makes sense:
conclusion:
I think this comment about exposing internal objects is a bit unfair, and of course, the
data.frame
class has many methods defined for it, so we cannot possibly define or prevent all those methods from producing all sorts of inconsistent outputs. But we can at least prevent that from happening for most standard methods. Below are two proposalsproposal 1 (easy but possibly impossible):
data.frame
class from as many of the internal objects ofcausal_model
class as possibleproposal 2:
summary
and maybeplot
for most of those classes