Open mitchelloharawild opened 5 years ago
Those seem pretty reasonable but I think that it would be good to have the generic for estimate
only involve the second argument to be less restrictive.
We also have the convention to make the first argument x
whenever possible. That might seem like we are being uptight, but these generics are meant to be used in a broad context. For example:
estimate <- function(.data, .model){
UseMethod("estimate")
}
I would want to use this on an existing model object (where the data have already been consumed). For example, I wrote an S package for Shewhart charts a long time ago and wanted a generic that would return estimates of the process mean and variance (mean
and var
weren't generic back then). This signature would preclude something like that.
Are you doing double dispatch on these two objects? If not, you wouldn't lose anything by using a single argument.
Completely agree with the usage of estimate()
. Actually I changed this for the same reason last week in https://github.com/tidyverts/fablelite/commit/d34f1c7c33e716d58fc4b3d773eb3b97f975a72a.
The only reason it was so restrictive, was because I had originally planned to use this functionality only internally.
I don't feel too strongly about the name of the first argument, however I think using .data
will restrict the context of the generic in a beneficial way. In what scenarios do you anticipate estimate()
to be used without data as the first argument? Consistency for the usage of this generic may result in less cognitive load for the users.
In what scenarios do you anticipate estimate() to be used without data as the first argument
One thing that I'd use it for is unsupervised methods. So if I have an object with PCA loadings, I'd use estimate(object, new_data = df)
to get projections for new data points.
Sure, x
or object
is fine. So generics for model(x, ...)
and estimate(x, ...)
?
How general do you think the documentation of their functionality should be? Should it distinguish functionality between these generics, or should the methods have flexibility to use them inconsistently.
For example, in fable these generics would be used as follows:
model.tbl_ts(x, ...)
trains multiple model definitions to data, where x
is a tsibble, and ...
are the model definitions.
estimate.tbl_ts(x, .model, ...)
trains a single model definition to data, where x
is a tsibble and .model
is the model definition. ...
is unused.
Having some recommended usage of these verbs would make it easier for users to learn their functionality, it would also make them less flexible.
I'd suggest x
for both. I wouldn't really get too specific about how we think that these should be used. I think that the doc files can give examples of what existing methods do.
Sounds reasonable. I'll work on this a bit and make a PR.
I think it's most important that you give some thought to the type signature of the generic — i.e. what does it return? Does it return a data frame? A tibble? An object of the same type as x
?
The implementation in the fable series of packages is:
model(.data, ...)
.data
(keys+groups in a tsibble
)....
.estimate()
with the appropriate data split and model definition.summarise
-esque semantics. Respects groups and reduces data into summary statistics (model fit parameters).estimate(.data, .model, ...)
edit: Using estimate()
is discouraged, but is exported to allow users access to the lower level objects if they're particularly inquisitive. It also makes the nest-map-unnest workflow better if they're uncomfortable with using model()
.
These functions dispatch on a data object, and so if a similar approach is supported for cross-sectional modelling there would not be many more methods required. So I think the purpose of this generic may be less about consistent functionality, but more about avoiding namespace conflicts.
You could also argue that estimate()
should dispatch on .model
rather than .data
, which could make it easier to define model training methods. Currently fable
keeps the model's training method in the R6 class for the model definition.
These generics are used within
fable
models for producing a single model (estimate()
) or a collection of models (model()
) from model definitions.To avoid "namespace grabbing", these generics should probably be moved to generics before the first CRAN release of the
fable
packages.I'm also happy to open a dialogue for how these generics should be used, with regards to their default arguments.
Relevant discussion detailing the usage of
model()
, and to a lesser extentestimate()
: https://github.com/tidyverts/fable/issues/91