HGF: omega, theta, and related questions

eboonst commented 4 years ago

Dear Dr Mathys,

First of all thank you for your excellent papers and detailed documentation. In our lab, we intend to do a variation on the study by Vossel et al. (2013). To this end, we are first trying to replicate the original findings. While getting the model to run is not a problem, I find it difficult to ascertain whether what I am doing is correct. I am running into particular trouble when it comes to translating the priors specified in the 2013 paper to MATLAB code for the perceptual model.

My apologies if this is obvious, but what confuses me is the theta (ϑ) parameter. I don't understand where to specify the prior of this parameter. I can not find a mention of it in either the scripts or the manual. Is it simply ω3? If so, then why is it not called ω3 in the papers?
I understand from Vossel et al (2013: 1441) and Mathys et al (2014: 9) that ϑ is estimated in logit-transformed space. If ϑ is indeed ω3, does this mean that when the prior for ϑ is specified as .1 in Vossel et al (2013: 1441), that this corresponds to ω3=-2.1972 (log(.1/(1-.1)))? Am I correct to assume that the specified variance (100) is not transformed?
Relatedly, how is the upper bound specified for ϑ (1 in the original paper)?
What makes me suspicious that I am getting it wrong, is that when I run the perceptual model with inputs [0, 1] corresponding to an identical trajectory of probabilities as in Vossel et al (2013, figure 1), I end up with a very stable x3 trajectory:

While the x2 predictions do track the changing probabilities (as in Vossel et al (2013), figure 8): track_probs

My question is: given how volatile the probabilities are over time, shouldn't there be more variability in the x3 prediction? Or is the behavior showcased expected, simply because the trajectory is equally volatile over time?

What is the relation between c.mu_0mu and c.mu_0sa on the one hand, and c.logsa_0mu and c.logsa_0sa on the other? While the default values seem fixed (variance=0 as in the Vossel et al (2013) paper, would it not make sense to make the starting value/prior reflect the fact that participants completed a practice block with high (88%) cue validity?
What makes me even more suspicious of getting it wrong is that when I fit the model to my data together with the precision response model, I find that my participants do not vary in their fitted ω3 value. When I correlate ω2 with ω3 in an attempt to replicate figure 6a in Vossel et al (2013), it looks like this:

Compared to the original:

Screenshot 2020-01-21 at 16 46 14

I guess it could be that my participants simply did not judge the environment of my task to be volatile, but this seems unlikely to me seeing how most of them do track the changing probability, given the x2 predictions. While some participants do not seem to keep track of the changing probability at all: bad_ppn

And some do to some extent, ok_ppn

For many participants the x2 predictions look as they should (?): good_ppn

I am sorry for all these questions and this wall of text. Any insight you may be able to offer would be greatly appreciated.

All the best, Evert Boonstra

Just in case, I include all priors used below:

Perceptual model:

% Sufficient statistics of Gaussian parameter priors

% Initial mus and sigmas
% Format: row vectors of length n_levels
% For all but the first two levels, this is usually best
% kept fixed to 1 (determines origin on x_i-scale). The 
% first level is NaN because it is determined by the second,
% and the second implies neutrality between outcomes when it
% is centered at 0.
c.mu_0mu = [NaN, 0, 1]; 
c.mu_0sa = [NaN, 0, 0]; 

c.logsa_0mu = [NaN,     log(0.1),   log(1)];  
c.logsa_0sa = [NaN,     0,          0];      

% Rhos
% Format: row vector of length n_levels.
% Undefined (therefore NaN) at the first level.
% Fix this to zero to turn off drift.
c.rhomu = [NaN, 0, 0];
c.rhosa = [NaN, 0, 0];

% Kappas
% Format: row vector of length n_levels-1.
% Fixing log(kappa1) to log(1) leads to the original HGF model.
% Higher log(kappas) should be fixed (preferably to log(1)) if the
% observation model does not use mu_i+1 (kappa then determines the
% scaling of x_i+1).
c.logkamu = [log(1), log(1)];
c.logkasa = [     0,      0];

% Omegas
% Format: row vector of length n_levels.
% Undefined (therefore NaN) at the first level.
c.ommu = [NaN,  -6,  -2.1972];  
c.omsa = [NaN, 100, 100];

Response model (precision):

% Sufficient statistics of Gaussian parameter priors
%
% Zeta_1_valid
c.logze1vmu = log(0.0052);
c.logze1vsa = 0.1;

% Zeta_1_invalid
c.logze1imu = log(0.0052);
c.logze1isa = 0.1;

% Zeta_2
c.logze2mu = log(0.0006);
c.logze2sa = 0.001;

% Zeta_3
c.logze3mu = log(0.001);
c.logze3sa = 1000;

EDIT 29/01/2020: When I change the mu prior from:

c.mu_0mu = [NaN, 0, 1]; 
c.mu_0sa = [NaN, 0, 0]; 

c.logsa_0mu = [NaN,     log(0.1),   log(1)];  
c.logsa_0sa = [NaN,     0,          0];

to

c.mu_0mu = [NaN, 1, 1]; 
c.mu_0sa = [NaN, 0, 0]; 

c.logsa_0mu = [NaN,     log(0.1),   log(1)];  
c.logsa_0sa = [NaN,     0,          0];

there does seem to be more variability in ω3: omega_plot But this variability is still nowhere near the original Vossel et al (2013) paper.

edit 06-02-2020: I mistook theta (ϑ) for nu (ν). My mistake, I changed it in the post above. Unfortunately my questions remain.

chmathys commented 4 years ago

Dear Evert,

Thanks for your message. I’ll respond as soon as I can, but in ten days I’ll be moving country and institution, so this is an extraordinarily busy time for me. I hope you understand that it may therefore be a while before I get around to answer you.

Best wishes, Chris On 21 Jan 2020, 5:14 PM +0100, eboonst notifications@github.com, wrote:

Dear Dr Mathys, First of all thank you for your excellent papers and detailed documentation. In our lab, we intend to do a variation on the study by Vossel et al. (2013). To this end, we are first trying to replicate the original findings. While getting the model to run is not a problem, I find it difficult to ascertain whether what I am doing is correct. I am running into particular trouble when it comes to translating the priors specified in the 2013 paper to MATLAB code for the perceptual model.

My apologies if this is obvious, but what confuses me is the nu (ν) parameter. I don't understand where to specify the prior of this parameter. I can not find a mention of it in either the scripts or the manual. Is it simply ω3? If so, then why is it not called ω3 in the papers?

I understand from Vossel et al (2013: 1441) and Mathys et al (2014: 9) that ν is estimated in logit-transformed space. If ν is indeed ω3, does this mean that when the prior for ν is specified as .1 in Vossel et al (2013: 1441), that this corresponds to ω3=-2.1972 (log(.1/(1-.1)))? Am I correct to assume that the specified variance (100) is not transformed?

Relatedly, how is the upper bound specified for ν (1 in the original paper)?

What makes me suspicious that I am getting it wrong, is that when I run the perceptual model with inputs [0, 1] corresponding to an identical trajectory of probabilities as in Vossel et al (2013, figure 1), I end up with a very stable x3 trajectory:

While the x2 predictions do track the changing probabilities (as in Vossel et al (2013), figure 8): My question is: given how volatile the probabilities are over time, shouldn't there be more variability in the x3 prediction? Or is the behavior showcased expected, simply because the trajectory is equally volatile over time?

What is the relation between c.mu_0mu and c.mu_0sa on the one hand, and c.logsa_0mu and c.logsa_0sa on the other? While the default values seem fixed (variance=0 as in the Vossel et al (2013) paper, would it not make sense to make the starting value/prior reflect the fact that participants completed a practice block with high (88%) cue validity?

What makes me even more suspicious of getting it wrong is that when I fit the model to my data together with the precision response model, I find that my participants do not vary in their fitted ω3 value. When I correlate ω2 with ω3 in an attempt to replicate figure 6a in Vossel et al (2013), it looks like this:

Compared to the original: I guess it could be that my participants simply did not judge the environment of my task to be volatile, but this seems unlikely to me seeing how most of them do track the changing probability, given the x2 predictions. While some participants do not seem to keep track of the changing probability at all: And some do to some extent, For many participants the x2 predictions look as they should (?): I am sorry for all these questions and this wall of text. Any insight you may be able to offer would be greatly appreciated. All the best, Evert Boonstra Just in case, I include all priors used below: Perceptual model: % Sufficient statistics of Gaussian parameter priors

% Initial mus and sigmas

% Format: row vectors of length n_levels

% For all but the first two levels, this is usually best

% kept fixed to 1 (determines origin on x_i-scale). The

% first level is NaN because it is determined by the second,

% and the second implies neutrality between outcomes when it

% is centered at 0.

c.mu_0mu = [NaN, 0, 1];

c.mu_0sa = [NaN, 0, 0];

c.logsa_0mu = [NaN, log(0.1), log(1)];

c.logsa_0sa = [NaN, 0, 0];

% Rhos

% Format: row vector of length n_levels.

% Undefined (therefore NaN) at the first level.

% Fix this to zero to turn off drift.

c.rhomu = [NaN, 0, 0];

c.rhosa = [NaN, 0, 0];

% Kappas

% Format: row vector of length n_levels-1.

% Fixing log(kappa1) to log(1) leads to the original HGF model.

% Higher log(kappas) should be fixed (preferably to log(1)) if the

% observation model does not use mu_i+1 (kappa then determines the

% scaling of x_i+1).

c.logkamu = [log(1), log(1)];

c.logkasa = [ 0, 0];

% Omegas

% Format: row vector of length n_levels.

% Undefined (therefore NaN) at the first level.

c.ommu = [NaN, -6, -2.1972];

c.omsa = [NaN, 100, 100];

Response model (precision): % Sufficient statistics of Gaussian parameter priors

%

% Zeta_1_valid

c.logze1vmu = log(0.0052);

c.logze1vsa = 0.1;

% Zeta_1_invalid

c.logze1imu = log(0.0052);

c.logze1isa = 0.1;

% Zeta_2

c.logze2mu = log(0.0006);

c.logze2sa = 0.001;

% Zeta_3

c.logze3mu = log(0.001);

c.logze3sa = 1000;

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

eboonst commented 4 years ago

Dear Chris,

Thank you for your quick response. I completely understand. Any insight from you would be amazing, whenever that may be. I wish you a good move!

Best, Evert

chmathys commented 4 years ago

Dear Evert,

On top of my move, the Covid crisis happened, so I’m only now coming round to answering your questions. Please accept my apologies for the delay. Taking your points in order,

I’m not quite sure what parameter we’re talking about. Might this be theta? In any case, theta = exp(omega3). At some point in the history of the HGF Toolbox, we switched from estimating theta in logit space to estimating omega3 in native space. This has the advantage of being more intuitive in that it corresponds to omega2, simply one level higher. However, for your purposes, it may be better to do it the original way with an upper bound on theta and estimation in logit space. This may give you more scope for phasic changes in volatility. See below.
Correct, variance is always in estimation space. But notice that theta = exp(omega3).
The upper bound for theta was 0.2 such that the prior mean 0.1 was in the middle between lower and upper bound.
In order to get the same setup as we had in Vossel at al., you need to reintroduce theta instead of omega3. If you contact me directly, I’ll send you a legacy version of the toolbox with this setup.
c.mu_0mu and c.mu_0sa are the prior means and prior variances of the means of the initial beliefs, respectively. c.logsa_0mu and c.logsa_0sa are the prior means and prior variances of the variances of the initial beliefs, respectively. You are right that the training block might have an influence on these. You could try estimating them, or fixing them to values that seem appropriate in light of the training block.
You should try this with our setup from the paper as described above.

Best wishes, Chris On 21 Jan 2020, 6:04 PM +0100, Christoph Mathys chmathys@gmail.com, wrote:

Dear Evert,

Thanks for your message. I’ll respond as soon as I can, but in ten days I’ll be moving country and institution, so this is an extraordinarily busy time for me. I hope you understand that it may therefore be a while before I get around to answer you.

Best wishes, Chris On 21 Jan 2020, 5:14 PM +0100, eboonst notifications@github.com, wrote:

Dear Dr Mathys, First of all thank you for your excellent papers and detailed documentation. In our lab, we intend to do a variation on the study by Vossel et al. (2013). To this end, we are first trying to replicate the original findings. While getting the model to run is not a problem, I find it difficult to ascertain whether what I am doing is correct. I am running into particular trouble when it comes to translating the priors specified in the 2013 paper to MATLAB code for the perceptual model.

My apologies if this is obvious, but what confuses me is the nu (ν) parameter. I don't understand where to specify the prior of this parameter. I can not find a mention of it in either the scripts or the manual. Is it simply ω3? If so, then why is it not called ω3 in the papers?

I understand from Vossel et al (2013: 1441) and Mathys et al (2014: 9) that ν is estimated in logit-transformed space. If ν is indeed ω3, does this mean that when the prior for ν is specified as .1 in Vossel et al (2013: 1441), that this corresponds to ω3=-2.1972 (log(.1/(1-.1)))? Am I correct to assume that the specified variance (100) is not transformed?

Relatedly, how is the upper bound specified for ν (1 in the original paper)?

What makes me suspicious that I am getting it wrong, is that when I run the perceptual model with inputs [0, 1] corresponding to an identical trajectory of probabilities as in Vossel et al (2013, figure 1), I end up with a very stable x3 trajectory:

While the x2 predictions do track the changing probabilities (as in Vossel et al (2013), figure 8): My question is: given how volatile the probabilities are over time, shouldn't there be more variability in the x3 prediction? Or is the behavior showcased expected, simply because the trajectory is equally volatile over time?

What is the relation between c.mu_0mu and c.mu_0sa on the one hand, and c.logsa_0mu and c.logsa_0sa on the other? While the default values seem fixed (variance=0 as in the Vossel et al (2013) paper, would it not make sense to make the starting value/prior reflect the fact that participants completed a practice block with high (88%) cue validity?

What makes me even more suspicious of getting it wrong is that when I fit the model to my data together with the precision response model, I find that my participants do not vary in their fitted ω3 value. When I correlate ω2 with ω3 in an attempt to replicate figure 6a in Vossel et al (2013), it looks like this:

Compared to the original: I guess it could be that my participants simply did not judge the environment of my task to be volatile, but this seems unlikely to me seeing how most of them do track the changing probability, given the x2 predictions. While some participants do not seem to keep track of the changing probability at all: And some do to some extent, For many participants the x2 predictions look as they should (?): I am sorry for all these questions and this wall of text. Any insight you may be able to offer would be greatly appreciated. All the best, Evert Boonstra Just in case, I include all priors used below: Perceptual model: % Sufficient statistics of Gaussian parameter priors

% Initial mus and sigmas

% Format: row vectors of length n_levels

% For all but the first two levels, this is usually best

% kept fixed to 1 (determines origin on x_i-scale). The

% first level is NaN because it is determined by the second,

% and the second implies neutrality between outcomes when it

% is centered at 0.

c.mu_0mu = [NaN, 0, 1];

c.mu_0sa = [NaN, 0, 0];

c.logsa_0mu = [NaN, log(0.1), log(1)];

c.logsa_0sa = [NaN, 0, 0];

% Rhos

% Format: row vector of length n_levels.

% Undefined (therefore NaN) at the first level.

% Fix this to zero to turn off drift.

c.rhomu = [NaN, 0, 0];

c.rhosa = [NaN, 0, 0];

% Kappas

% Format: row vector of length n_levels-1.

% Fixing log(kappa1) to log(1) leads to the original HGF model.

% Higher log(kappas) should be fixed (preferably to log(1)) if the

% observation model does not use mu_i+1 (kappa then determines the

% scaling of x_i+1).

c.logkamu = [log(1), log(1)];

c.logkasa = [ 0, 0];

% Omegas

% Format: row vector of length n_levels.

% Undefined (therefore NaN) at the first level.

c.ommu = [NaN, -6, -2.1972];

c.omsa = [NaN, 100, 100];

Response model (precision): % Sufficient statistics of Gaussian parameter priors

%

% Zeta_1_valid

c.logze1vmu = log(0.0052);

c.logze1vsa = 0.1;

% Zeta_1_invalid

c.logze1imu = log(0.0052);

c.logze1isa = 0.1;

% Zeta_2

c.logze2mu = log(0.0006);

c.logze2sa = 0.001;

% Zeta_3

c.logze3mu = log(0.001);

c.logze3sa = 1000;

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

translationalneuromodeling / tapas

HGF: omega, theta, and related questions #78