sensitivity analysis on the observed variables `yt`

I have another question. Suppose I want to make a sensitivity analysis on the observed variables yt to understand which of its components is most affecting the estimation of the factors given the fixed T and Z. The aim is to evaluate a kind of features importance on the yt. So my idea is to keep T and Z fixed, to estimate the new smoothed factors at, evaluate the new residuals and compare them with the original ones. Suppose I change only the i-th variable in yt, I can evaluate the increase Di in the residuals compared to the original ones. Then I can rank the importance of the variables of yt by sorting Di from the highest to the lowest (the higher the worsening of the residuals, the more important/sensitive the variable is).

If this procedure makes sense, should I re-estimate the entire state space equations or can I somehow use the fitted fkf to "predict" with the new yt?

Hope it's clear and thanks in advance

Originally posted by @AleBitetto in https://github.com/waternumbers/FKF/issues/9#issuecomment-1010236707

I'm not entirely clear what you are trying to achieve, mainly because you refer to "smoothed factors" at where in the variables output by fkf the at are the step-ahead forecasts of the mean of the state variable.

The sensitivity of the state estimates (whether filtered, forecast or smoothed) depends on all the components the model; not just T and Z. Even if T, Z, c and d are well defined you need to be clear about how you are treating the covariance terms H and G, particularly if they are estimated by fitting the model to the data (which you are then altering).

If you are interested in the forecast performance then comparing the residuals makes sense, but perhaps choose a metric for ranking that allows for changes in the forecast variance (Pt).

If you are interested in changes to filtered or smoothed state estimate then I would look to see if changes can be identified between the original estimates and those using the perturbed data.

Either way the cleanest way to implement it is to call fkf (and fks if required) with the perturbed data. There are some simplifications that could be made is certain situations, but implementing them in R may well be slower.

You're right, the smoothed factors are the output of fks. Let me give a bit of context. My approach is to fit several Dynamic Factor Models on data from different countries (for each country I have the same set of variables over the same years, a panel structure), so to estimate the factors' transition matrix (equivalent of T) and the observations' loading matrix (equivalent of Z). Then I assemble a cross-country transition matrix T and loading matrix Z (with another procedure), I stack the factors and the observations and I apply the Kalman filter to include the cross-country dynamics into the single-country's factors. At the end I have factors for each country that incorporate the cross-country dynamics as well.

1) So, I want to estimate the yt's reconstruction error to select the optimal dimension m of factors and, therefore, I needed to understand how the residuals where constructed. If I smooth the factors with fks, can I still use them in the equation yt_hat = ct + Zt * ahatt?

2) Then, moving to the sensitivity analysis, the aim is to understand which observations' variable contributes the most to the factors' evaluation. So I want to perturbate the original observations and the how worse the fitted observations become, keeping Z and T fixed with the assumption that they carry the "true" dynamics of the data learned by the DFM first and from the cross-country adjustment then.

3) I'm aware that there are better metrics to choose the optimal dimension m or to estimate the features importance, such as loglikelihood, variance P, etc, but I'm comparing the DFM-Kalman model with other models and the reconstruction MSE is the only common one.

So, probably the most relevant question is the 1.

Thanks again for the help

waternumbers / FKF

sensitivity analysis on the observed variables `yt` #10