Closed RNKuhns closed 9 months ago
this is a great idea, we need a few tests for evaluation, wilcoxon sign rank and a couple of others, and I prefer to have bespoke implementations.
just pinging this here, as it is dependent on hypothesis tests which we can work into this package https://github.com/alan-turing-institute/sktime/issues/1186
Abandoned and superseded by the parameter estimator module - which follows similar ideas but uses get_fitted_params
instead of report
.
Is your feature request related to a problem? Please describe. Statistical tests have common use-cases in timeseries analysis, including inspect the properties of a timeseries (e.g. stationarity testing, checking normality) to guide modeling decisions and also to evaluate model output (including evaluating quality of forecasts).
Adding a interface for statistical tests will allow Sktime to add relevant functionality in this area. In addition to the tests themselves, this will help enable conditional transformers (think differencing a series if it is non-stationary or using a BoxCox transform based on results of normality or conditional variance test) and post-hoc forecast evaluation/benchmarking (Diebold-Mariano and other tests of one set of forecasts against another).
Describe the solution you'd like A interface and module for statistical tests in Sktime. The module's
base
would include a class that will be the basis for all tests.My thoughts on the interface are generally:
transform
orpredict
they should have areport
method that returns test resultsreport_detail
that defaults to True and reports all three items. But if it is set to False only whether the null was rejected would be reported). Note if a test doesn't have a p-value or test statistic that part of return will be None for that test.Proposed
BaseStatisticalTest
is presented below.Note that I'm open on design details, particularly the naming conventions (I don't have strong feelings about use of report, results, or something else) and likewise if we want to call this something other than statistical tests that is fine too).
The main outstanding questions (other then general feedback) involve around the interface for accepting different types of input that works across a range of tests.
This needs to cover:
Initial thoughts to solve this would be for
fit
to accept either a pd.Series, pd.DataFrame or NumPy array "Y" and optionally accept exogenous data "X" (some tests will use this others won't) and determine how to proceed based on the type of tests.This leaves the last piece, which is how to accept the data for post-hoc tests. Note that these tests often can be applied to univariate data, while an extension allows them to be applied to multivariate data. I'd propose we don't want separate classes based on that distinction. Instead, I propose the following logic:
fit
(kind of like how we handle y_train in performance metrics). If Y_other is received we check its dimension against Y and assume we are doing a post-hoc comparison of Y against "Y_Other"Note that I will edit this comment later to add a list of tests that can be interfaced (primarily from statsmodels) and a set of tests we'd need to write ourselves.
Plan would be to chunk this out in phases:
Describe alternatives you've considered An alternative I've considered is to import and use tests from other packages (Statsmodels) when available. But there are tests not in Statsmodels that we should add (post-hoc forecast evaluation ones in particular). Having a common interface that can be used to adapt Statsmodels tests to our format and also be used for our own statistical tests seems like the way to go to me for uniformity.
Note that in terms of the interface, one consideration I've had is whether to have a separate base class for post-hoc tests and diagnostic tests. Main difference is interface for
fit
as the diagnostic tests don't need to worry about "Y_Other".