Question on documentation

mg10011 commented 8 years ago

Matt,

Another great library. I was able to successfully run the library, but I'm not sure I understand what is going on.

You set the As to be as follows:

In [25]: As Out[25]: [array([[-0.99, -0. , 1.98, 0. ], [-0. , -0.99, 0. , 1.98]]), array([[-0.8660254, 0.5 , 1.8660254, -0.5 ], [-0.5 , -0.8660254, 0.5 , 1.8660254]]), array([[-0.8660254, -0.5 , 1.8660254, 0.5 ], [ 0.5 , -0.8660254, -0.5 , 1.8660254]])]

Since it is a list of size 3, I'm assuming that each 'A' represent the autoregressive coefficients for the state that corresponds to its list index. However, when running the inference, each state has a different parameter for the observation distribution. In other words:

In [21]: model.obs_distns[0].A Out[21]: array([[ 0.01111094, -0.01066601, -0.87942313, -0.48116723, 1.86847206, 0.49066796, 0.36817974], [ 0.02705619, 0.02772599, 0.43533996, -0.89806059, -0.46130999, 1.87026995, 0.25940765]])

which is a (2,7) shape. What exactly is 'A' member in the obs_distns[0] object?

Thanks in advance.

mattjj commented 8 years ago

Thanks! Ah yes, a bit rough around the edges, this one.

The shape of each A is determined by three parameters: the dimension of each vector observation, data_dim, the number of lags in the autoregression, num_lags, and the boolean of whether or not the autoregression is affine, affine. The A matrix is of shape (data_dim, data_dim * num_lags + affine), where we take the addition of a boolean to be adding 1 or 0 depending on whether it's True or False, respectively. So in the demo code, we have data_dim = 2, num_lags = 3, and affine = True, hence each A is of shape (2, 7).

The reason the A member has that shape is that it's slightly misnamed when the regression is affine; it actually has the vector b glued on as the last column. That is, a two-lag affine autoregression can be written as where the parameters are (A, b, \Sigma) with where by S^p_{++} I mean the set of p x p positive definite matrices and by \Sigma^{\frac{1}{2}} I mean any matrix square root. For this affine autoregression, the code actually glues b onto the end of A, so that the member obs_distn.A really stores

(In the case of a switching autoregression, like you'd use this library for, there's a triple of such autoregression parameters for each of the K states, {(A^{(k)}, b^{(k)}, \Sigma^{(k)})}_{k=1}^K, but I left that out of the notation.)

mattjj commented 8 years ago

Also: you're right about your interpretation of the list indexing of model.obs_distns. That is, model.obs_distns[k] is an object that models state k's autoregression, and so in particular model.obs_distns[0].A refers to the A and b parameters of state 0's autoregression, as above.

mg10011 commented 8 years ago

Very interesting,

After about 300 iterations, the inferred 'A' is quite different from the true 'A'.

For example, 'As[0]' is equal to

In [13]: As[0] Out[13]: array([[-0.99, -0. , 1.98, 0. ], [-0. , -0.99, 0. , 1.98]])

But, the 3 inferred states for A are as follows:

In [22]: model.obs_distns[0].A Out[22]: array([[ 0.0389, -0.029988, -0.9268, -0.4335, 1.8885, 0.4625, 0.2900], [0.0011, 0.0265, 0.4814, -0.9179, -0.48149, 1.8901, 0.5475]])

and

In [24]: model.obs_distns[2].A
Out[24]: 
array([[-0.01807356, -0.04896132, -0.9511103 ,  0.091712  ,  1.958486  ,
        -0.04255226, -0.17139359],
       [ 0.0588948 , -0.02964983, -0.11231683, -0.9250224 ,  0.05299565,
         1.94456436,  0.13938005]])

and

In [25]: model.obs_distns[6].A
Out[25]: 
array([[ 0.03657193, -0.04362399, -0.95168398,  0.55522543,  1.91329106,
        -0.5125169 , -0.17301202],
       [ 0.00370551, -0.0207272 , -0.51123704, -0.83231483,  0.50665192,
         1.85228362, -0.0859402 ]])

So, none of the inferred distributions seems to match the true 'A' parameters.

mattjj commented 8 years ago

If the segmentation is right then these methods are asymptotically consistent. (Sometimes these models can be unidentifiable but I don't think that's the case here.)

You can play with the amount of data, or even better compare the A's in terms of their frequency content rather than, say, Frobeneius norm (or eyeballing the entries). If nlags = 1 then "frequency content" basically means look at the eigenvalues of A (as complex numbers). If nlags > 1 then it means look at the eigenvalues of this matrix. You can also compute single-input-single-output transfer functions using eval_siso_transfer_function.

However, since estimates will be consistent in any topology (assuming the segmentation is right) I suggest playing with the amount of data!

mg10011 commented 8 years ago

I see the problem. Look at your 'demo.py' file.

When you are creating the 'true model', you only have a 2-lag autogression with no intercept (i.e. b = 0). However, in the inference, you are trying to fit a 3-lag autogression, as you have set nlag = 3. If you set nlag=2, then your inferred model will recover the 'true model', with 'b' actually being set to [0, 0].

mattjj commented 8 years ago

Ah, I must have fiddled with things at some point. Good catch! Though it probably still finds a good model even with the wrong number of lags. (There are also ARD, or 'automatic relevance determination', priors to select the number of lags and whether or not to use an affine component automatically, in the sense of Bayesian lasso.)

mattjj / pyhsmm-autoregressive

Question on documentation #8