tidyverts / fabletools

General fable features useful for extension packages
http://fabletools.tidyverts.org/
89 stars 31 forks source link

Reconciliation interface design #366

Open mitchelloharawild opened 1 year ago

mitchelloharawild commented 1 year ago

User-defined control parameters.

  1. Construction

    • [ ] Projection
    • [ ] Structural
    • [ ] ERM (low-priority)
  2. Weight matrix (typically requires access to model object and varies with data structure)

    • [ ] OLS
    • [ ] WLS
    • [ ] Structural
    • [ ] Sample
    • [ ] Shrinkage
    • [ ] More common types...
    • [ ] Time varying (maybe?)
    • [ ] Custom matrix
  3. Optimisation technique

    • [ ] Regular minimisation
    • [ ] Non-negative (LP, Heuristic)
    • [ ] Constraint matrix LP
  4. Data structure

    • [ ] Cross-sectional (Hierarchical & Grouped)
    • [ ] Temporal (Hierarchical & Grouped)
    • [ ] Cross-temporal
    • [ ] Arbitrary acyclical graphs (maybe?)
    • [ ] Disjoint
  5. Combination method/type

    • [ ] Additive
    • [ ] Linear combination

Are there more things that can be customised here?


User interface

Data structure and value combination method/type

Data structure and combination method are passed in via data attributes created at the aggregate_*() step. Allow the user to directly impose data structure constraints, for example defining a pre-existing aggregation structure from the data. This can also be used to remove aggregation structure to create disjoint hierarchies For example, you may have a cross-temporal structure but only want to make it temporally coherent. To achieve this, you can remove the key aggregation constraints.

Hold onto aggregation structure in <tsibble>, and <mdl_lst>

Code

Allow reconciliation of mables, fitted models, and model definitions.

Option A - reconcile() on model with all params as args

reconcile(<mbl_df>, lm = min_trace(lm, ...), ...) # as before, maybe soft-deprecated?

mutate(<mbl_df>,  lm = reconcile(lm, ...), ...)
mutate(<mbl_df>,  lm_ols = reconcile(lm,weights = weight_ols), lm_shr = reconcile(lm,weights = weight_shr), ...)

reconcile(<mdl_lst>, ???) 
reconcile(<mdl_def>, ???)

reconcile(object, weights = weight_fn, construction = constr_fn, opt_method = opt_fn)

Option B - reconcile() on mable with opt function as reconcile input fn

reconcile(<mbl_df>, lm = gls(lm, weights = weight_fn,, ...), ...)
reconcile(<mbl_df>, lm = nn(lm, weights = weight_fn, ...), ...)
reconcile(<mbl_df>, lm = lp_constrained(lm, weights = weight_fn,, ...), ...)

reconcile(<mdl_lst>, opt_fn = gls, weights = weight_fn, ... ) #??? 
reconcile(<mdl_def>, ???)

Option C - reconcile() on mable with construction function as reconcile input fn

reconcile(<mbl_df>, lm = proj(lm, weights = weight_fn, ...), ...)
reconcile(<mbl_df>, lm = struc(lm, weights = weight_fn, ...), ...)

reconcile(<mdl_lst>, opt_fn = gls, weights = weight_fn, ... ) #??? 

Option D - reconcile() on mable with node utilisation function as reconcile input fn

reconcile(<mbl_df>, lm = top_down(lm, weights = weight_fn, optimiser = opt_fn, ...), ...)
reconcile(<mbl_df>, lm = middle_out(lm, weights = weight_fn, optimiser = opt_fn, ...), ...)
reconcile(<mbl_df>, lm = bottom_up(lm, weights = weight_fn, optimiser = opt_fn, ...), ...)
reconcile(<mbl_df>, lm = all_nodes(lm, weights = weight_fn, optimiser = opt_fn, ...), ...)

Attention: @danigiro, @robjhyndman, @GeorgeAthana

FinYang commented 1 year ago

Do people get to vote on it ;)? I like options A and B - isn't it possible to implement both of them (or A and C for that matter) if reconcile is S3? (Not sure if implementing both is a good design choice)

danigiro commented 1 year ago

One step back

Talking with Tommy, the role of reconciliation is unclear. In this framework, we are doing:

data |>
  ... |>
  model(...) |>
  reconcile(...) |>
  forecast(...)

However, the real strength of reconciliation is that it is based on forecasts, not models. For example, the previous structure does not match with judgmental forecasts. In such situations, we need somenthing like this

data |>
  ... |>
  forecast(...) |>
  reconcile(...)

However, how to take the residuals for the covariance matrix is still a problem with this configuration. We need to talk more about that.

mitchelloharawild commented 1 year ago

Yes, welcoming votes and discussion. I just chatted with @robjhyndman and come up with option d, where the function describes the utilisation of forecasts across the graph. This is my current preference, and is most similar to our current interface (min_trace() -> all_nodes()) or something similar.

It's possible to implement all of the above at the same time, but that could be confusing as many functions give the same result.

mitchelloharawild commented 1 year ago

We can also have a reconcile() method for <fable> classes if it is really needed, but I don't see why this is required yet. Could you elaborate on the judgemental forecast reconciliation a bit more?

danigiro commented 1 year ago

We can also have a reconcile() method for <fable> classes if it is really needed, but I don't see why this is required yet. Could you elaborate on the judgemental forecast reconciliation a bit more?

Yes sure. The judgemental forecasting (e.g. the Delphi method) I was referring to is just an example where reconciliation should be applicable when the model object is not readily available.

For example, when one has forecasts that do not come from the fable package (maybe come from computationally intensive machine learning models in python or c++) but are stored in a csv file and loaded in R as a fable object, reconciliation should still be possible, since reconciliation depends on the forecasts themselves (and the covariance matrix), not models that generate the forecasts.

If fable contains all possible forecast models so forecasts can come from fable in any case, then building the reconcile function only on top of mable objects may be reasonable, but that might be too strong of an assumption to make.

FinYang commented 1 year ago

I agree with @daniGiro in saying that reconciliation should be independent of the models. When I first read how fable implement reconciliation (the current interface), I thought reconcile coming before forecast is because of practical considerations such that some information is only accessible in mables (e.g. covariance matrix) (and the actual reconciliation is done inside forecast method anyway), but

data |>
  ... |>
  forecast(...) |>
  reconcile(...)

is really how I think of reconciliation.

mitchelloharawild commented 1 year ago

I agree that it should be possible to reconcile a <fable>, but also think that it should be possible to impose reconciliation constraints on a list of models in a <mable>. I think we should support both, but reconciling a <fable> will require more inputs (such as its weights / residuals / response / etc.)

The current interface of reconciling a mable is part practical and part conceptual. Broadly speaking I think reconciliation (or producing coherent forecasts) is satisfying some additional constraints on the model. If these constraints are imposed, they should also hold true for in-sample fitted values and residuals. In the future I plan for fitted() to optionally provide a <fable> output which can/is coherent.

cynthiahqy commented 1 year ago

From my discussions with @mitchelloharawild today, there seems to be some overlap between "reconciliation of forecasts" and "reconciliation of data" more generally -- i.e. users might need to do "reconciliation" on imported data before any modelling.

In general, it could be useful to make something like top_down() in OPTION D applicable to more than just the forecasts. I'm working on a functional approach (in the "matrices as a map" sense) to data harmonisation in {conformr} that could be extended to facilitate reconciliation/coherency of data.

mitchelloharawild commented 1 year ago

Had another discussion with @robjhyndman today, mostly about graph reconciliation data structures.

We have also discussed functions to impose aggregation constraints into a tsibble. A weights column could be used for defining linear combination weights across nodes, but for arbitrary graphs an edge linked weights matrix might be needed.

Option D is the interface we're leaning toward, the function drastically changes the output and is simple to learn. All other parameters can then be arguments with suitable defaults.

Some practical examples on graph coherency would be useful before finalising an interface for this. From my imagining it seems that graphs are usually suited to being the only aggregation column, but it is theoretically possible to nest and cross these graph hierarchies.

I'm struggling to think of a graph linear combination reconciliation problem that isn't adequately represented with grouped hierarchies. The closest I've come is this toy example:

image

The number of apples and oranges sold determine both the total weight of produce and the total price/sales over time. Then these metrics are combined to give some measure of value. :shrug:

I'm going to continue trying to think of useful graph reconciliation problems, but it may not be something necessary to incorporate into the interface. If it is incorporated I think the graphs would be represented via a single key column with some extra attributes that describe the relations between nodes.

mitchelloharawild commented 1 year ago

I've thought about this more from a data structures perspective and I think it is neatest to store a graph of the constraints in the tsibble/mable/fable objects.

https://arxiv.org/abs/2204.09231 provides some details on how to keep some nodes immutable, which I think is how we should handle 0 variance nodes.

danigiro commented 1 year ago

I've thought about this more from a data structures perspective and I think it is neatest to store a graph of the constraints in the tsibble/mable/fable objects.

Yes, I think so too