Closed joon-e closed 1 year ago
My preferences are 1-3-2. I find approach 1 the most consistent. More frequent null-returns have a disadvantage, of course, but can be self-explanatory for users with well-formulated warnings().
Regrading names: output and subclasses seem fine. tdcmm
could be more explicit, though longer?
Agreed with option 1 as the easiest to use/understand/maintain. Also in agreement with @LKobilke (if I understood that correctly) that the NULL
-returning accessor functions could still be empty-implemented and throw talkative/telling errors.
I have started a very basic implementation of option 1, see branch add_tdcmm_classes (aa94627, etc.):
tdcmm
, a first accessor method and some basic tests in tdcmm_class.R
/ test-tdcmm_class.R
tdcmm_chi2
to categorical.R
> crosstab(WoJ, reach, employment, chi_square = TRUE) %>% model()
Pearson's Chi-squared test
data: .
X-squared = 16.005, df = 6, p-value = 0.01373
So the next steps would be to update all other output functions as well, unless you find any problems with this approach now.
On a side note, using subclasses per output type does also allow us to, where necessary, change the print format of specific outputs. For example, I have removed the "test values" message from crosstab(chi_square = TRUE)
and changed the printed tibble to include this information instead. This has the added benefit of not printing out this info when crosstab()
is assigned to an object:
> crosstab(WoJ, reach, employment, chi_square = TRUE)
# A tibble: 3 × 5
employment Local Regional National Transnational
* <chr> <dbl> <dbl> <dbl> <dbl>
1 Freelancer 23 36 104 9
2 Full-time 111 287 438 66
3 Part-time 15 32 75 4
# Chi-square = 16.005, df = 6, p = 0.014, V = 0.082
I love it!
A couple of minor notes/questions:
categorical.R
/crosstab()
)model()
function could issue a warning if no model is found.model()
would, according to the current code, return a list of models, right? What about adding optional parameters to the model()
function to specify in a more "speaking" manner?Thanks for the first two notes, I'll update this when I add proper documentation.
Regarding the third note/question, yes, with multiple models a list of models is returned. I'm not sure what optional parameters you suggest - specifying which model(s) to return?
That was my thought, yes. Something like which = "..."
. But if we are only talking about multiple models in the sense of one per pair of variables (e.g., in t_test
), then a list should be most feasible.
Quick question on the progress: Do we aim for
I think it makes a bit more sense to go for (1) in terms of sparing resources. However, if we do that, a bit more work rests on the shoulders of @joon-e which tempts me to rather opt for (2) instead ...
Ideally, we could get an early version up and running (in this repo, not on cran obvsly) somewhen around the start of this semester.
While my initial idea was 1) as well, due to time constraints and workloads on my end I'm leaning more towards 2) now. I hope I'll have the initial model functions PR-ready by the beginning of next week.
Now that we decided on path (2), I'd be thankful if we clarified the further working process: That is, are we fine with one review per pull/merge request? Also, particularly, does @joon-e want to have a final say for every PR?
Other than that, from my point of view, the next steps would be:
Any thoughts on this?
I'm totally fine with one review per PR (which is how it is currently setup anyway, I guess); I'd like to have a final look before devel gets merged into master, at least for now, but again, that's how it's setup right now anyway.
Several functions, including
crosstab()
,t_test()
, etc., estimate statistical models. These model objects should be available in the result objects, which so far are standard tibbles. We can pass the model objects along by adding them as an attribute to a custom class and then providing a generic accessor (e.g,model()
). We may also want to include further information, e.g. objects containing asssumption checks or avisualize()
method (see #23).However, the package contains functions...
describe()
)crosstab
(),
regress()`)t_test()
). (Amodel()
accessor function should thus return the model object if one model was estimated, or a list of model objects if more models were estimated)Furthermore, not all of them need, or at least, perform assumption checks (citation needed?);
visualize()
should provide different plots for different types of models.Thus, we have (at least) the following options:
1. One "superclass" (e.g.,
tdc.output
) with individual subclasses per model typeThe superclass would provide all necessary attributes (e.g.,
model
,checks
) which default toNULL
if not feasible for a specific type. An individual subclass per model/output type (e.g.,tdc.t_test
) would be used forvisualize()
/plot()
generics. Easy to implement and expand, however, accessor functions would often returnNULL
.2. Individual classes (e.g.,
tdc.t_test
,tdc.regression
) per model/output typeFlexible, but potential for a lot of redundancies and future inconsistencies.
3. Superclasses per output "category" (e.g.,
tdc.univariate
,tdc.bivariate
) and individual subclasses per model/output typeFor example, only outputs based on model objects would have a model attribut, thus less
NULL
attributes as in case 1. However, may also lead to redundancies and inconsistencies.I'm leaning towards option 1, but am open to suggestions/feedback (including better class names).