Closed michael-franke closed 1 year ago
I agree, we should change that. Should we address this before CRAN submission?
I suggest that we include an option for "extract_cell_draws" (e.g., parameter "group = 'all'") to return the full list. Alternatively, we could make that the default and return the draws from the grand mean for something like parameter "group = "grand_mean").
Sounds good. We could have a third option "custom" for specifications that fall in neither category, e.g., when averaging over some factor or when using operators !=
or |
.
I agree, we should change that. Should we address this before CRAN submission?
It seems a common enough use-case, so that this should likely be included from the start, in my opinion.
We could have a third option "custom" for specifications that fall in neither category, e.g., when averaging over some factor or when using operators
!=
or|
.
Hm, I don't fully understand, sorry. Use of !=
and |
in the group
option is already possible, AFAICT.
Additional operations like mean
, log
which only require one "cell" (one set of draws) can be easily computed in a second step.
To generalize the extraction of cell draws to allow for any kind of operations on multiple cells may be very useful, but would also (I think) be an effortful step beyond the current functionality, would it not? If that's so, I'm inclined to think that this is a nice feature for after a first functional release. (Or am I missing something?)
Hm, I don't fully understand, sorry. Use of
!=
and|
in thegroup
option is already possible, AFAICT.
Currently, it is possible. But if we change the parameter to group = c('all', 'grand_mean')
, we lose the option to pass a custom specification. To still keep this option, I suggest having group = c('all', 'grand_mean', 'custom')
and an additional parameter (maybe something like group_spec
?) to be set when group = 'custom'
.
Say we have a $2 \times 2$ factorial design with factors "gender" (F, M) and "context" (inf, pol). extract_cell_draws
could then give us:
extract_cell_draws(fit, group = 'all')
--> draws for cells F & inf, F & pol, M & inf, M & pol
extract_cell_draws(fit, group = 'grand_mean')
--> draws for the mean of cells F & inf, F & pol, M & inf, M & pol
extract_cell_draws(fit, group = 'custom', group_spec = context != 'pol')
--> draws for whatever is specified in group_spec
- here: draws for context "inf", averaged over gender
To generalize the extraction of cell draws to allow for any kind of operations on multiple cells may be very useful, but would also (I think) be an effortful step beyond the current functionality, would it not? If that's so, I'm inclined to think that this is a nice feature for after a first functional release. (Or am I missing something?)
I'm not sure if I understood correctly. Do you mean extract multiple cells and then perform some operation on them? If so, it might make sense to introduce a new function that does this, since it goes beyond just extracting draws from the model fit. And what operations did you have in mind?
It should be technically possible to just have one parameter group
, is it not? E.g., code like this:
f <- function(x) { if (x == "huhu") {
} else { group_spec <- dplyr::enquo(x)
} }
It may be unpleasant or dispreferred for reasons of good style, though.
Alternatively, maybe there should really be two functions: one which gets the draws for all cells (roughly: up to line 144 (draws_for_cells <- Y %*% t(X)
) in faintr_functions.R
; and another which then extracts the desired cell or mean of cells. The second would likely want to call the former.
I'm not sure if I understood correctly. Do you mean extract multiple cells and then perform some operation on them? If so, it might make sense to introduce a new function that does this, since it goes beyond just extracting draws from the model fit. And what operations did you have in mind?
Researchers might want to do all sort of stuff, e.g., looking at means of logs, at weighted means, at maxima, SDs or whatever other summary statistics. I don't think that we should be bothered by these -arguably: much more marginal- use cases, though. It it is easy to retrieve samples for each cell by a single function call, the users will have enough flexibility to do as they please.
Technically, it should be possible. However, to keep it safe in terms of good style and usability, I think it's better to have one input type per parameter (but your call!)
Alternatively, maybe there should really be two functions: one which gets the draws for all cells (roughly: up to line 144 (
draws_for_cells <- Y %*% t(X)
) infaintr_functions.R
; and another which then extracts the desired cell or mean of cells. The second would likely want to call the former.
Sounds good to me! Separate functions could also make maintenance easier as we break down the code into smaller chunks.
Researchers might want to do all sort of stuff, e.g., looking at means of logs, at weighted means, at maxima, SDs or whatever other summary statistics. I don't think that we should be bothered by these -arguably: much more marginal- use cases, though. It it is easy to retrieve samples for each cell by a single function call, the users will have enough flexibility to do as they please.
Thanks for clarifying. I agree. If at all, this is something we might look into in the (distant) future.
Separate functions could also make maintenance easier as we break down the code into smaller chunks.
Great! Let's do that then.
I might be overlooking something, but I don't think that there is a simple way to obtain draws for ALL cells, while this is probably a rather common use case (e.g., for plotting / global inspection). As far as I can see, the user would have to call "extract_cell_draws" for every single cell, despite the fact that "extract_cell_draws" internally computes draws for ALL cells on each call. That's also rather inefficient.
I suggest that we include an option for "extract_cell_draws" (e.g., parameter "group = 'all'") to return the full list. Alternatively, we could make that the default and return the draws from the grand mean for something like parameter "group = "grand_mean"). I'd imagine that wanting ALL samples is more frequent than wanting samples for the grand mean, so the former would be better for the user.