lnccbrown / HSSM

Development of HSSM package
Other
71 stars 10 forks source link

When regression specified for both drift (v) and threshold (a), model seemingly fits threshold for every observation #389

Closed clithero closed 2 months ago

clithero commented 2 months ago

Thank you for this wonderful new package.

Using HSSM version 0.2.0, I am trying to run some regressions with both drift rate and threshold. However, it seems that whenever a regression is specified for both v and a, the output is problematic. It looks to be fitting a threshold for every trial in the dataset. This has happened with my own data and I have now reproduced it with the included Cavanagh data. It was happening initially with more complex models, which I thought was the problem, but I have reproduced it with simpler models like the one below.

Any pointers for what I am missing or why this is happening and how to avoid it would be much appreciated!

**CODE TO REPRODUCE*** import hssm import pandas as pd import arviz as az

hssm.set_floatX("float32")

from jax.config import config

config.update("jax_enable_x64", False)

cav_data = hssm.load_data('cavanagh_theta')

MODEL

model_spec = hssm.HSSM( model="ddm", data=cav_data, include=[ { "name": "v", "formula": "v ~ 1 + stim", "link": "identity",
},
{ "name": "a", "prior": { "Intercept": {"name": "Uniform", "lower": 0.01, "upper": 5.0}, },
"formula": "a ~ 1 + stim", "link": "identity",
}, ], )

model_spec_infer = model_spec.sample() model_summary = pd.DataFrame(az.summary(model_spec_infer)) model_summary

OUTPUT Here is what the "model_summary" looks like:

          mean     sd  hdi_3%  hdi_97%  mcse_mean  mcse_sd  ess_bulk  ess_tail  r_hat

v_Intercept 0.232 0.033 0.172 0.294 0.000 0.000 4620.0 3126.0 1.0 v_stim[WL] 0.410 0.039 0.340 0.487 0.001 0.000 4333.0 3216.0 1.0 v_stim[WW] -0.066 0.045 -0.148 0.022 0.001 0.001 4855.0 3139.0 1.0 a_stim[WL] -0.041 0.019 -0.077 -0.006 0.000 0.000 3823.0 3472.0 1.0 a_stim[WW] -0.106 0.021 -0.143 -0.067 0.000 0.000 3755.0 3185.0 1.0 ... ... ... ... ... ... ... ... ... ... a[3983] 1.114 0.016 1.085 1.142 0.000 0.000 3843.0 3038.0 1.0 a[3984] 1.073 0.012 1.050 1.096 0.000 0.000 4251.0 3295.0 1.0 a[3985] 1.073 0.012 1.050 1.096 0.000 0.000 4251.0 3295.0 1.0 a[3986] 1.114 0.016 1.085 1.142 0.000 0.000 3843.0 3038.0 1.0 a[3987] 1.008 0.015 0.982 1.037 0.000 0.000 4259.0 3336.0 1.0

[3996 rows x 9 columns]

frankmj commented 2 months ago

Hi John,

this is just an artifact of Bambi where when any regression model is specified such that the parameters might depend on the specific trial (here condition) it will also report the fitted parameter from that trial based on the regression output (you will see below that the a parameter is the same for all trials of the same condition; this would not be the case if you had a continuous regressor on a). But this is not actually a problem, it is not fitting a separate parameter per trial but rather per condition and then just reporting the determinstic value that applies to that trial.

These are stored in the trace object so arviz displays them by default. You can simply pass an argument to arviz to exclude those outputs using e.g. var_names = ['~a']

But we are also working on a fix so that these would not be shown by default (this is issue #380 https://github.com/lnccbrown/HSSM/issues/380 ).

Michael J Frank, PhD | Edgar L. Marston Professor Director, Carney Center for Computational Brain Science https://www.brown.edu/carney/ccbs Laboratory of Neural Computation and Cognition https://www.lnccbrown.com/

Brown University website http://ski.clps.brown.edu

On Wed, Apr 10, 2024 at 12:40 PM John Clithero @.***> wrote:

Thank you for this wonderful new package.

Using HSSM version 0.2.0, I am trying to run some regressions with both drift rate and threshold. However, it seems that whenever a regression is specified for both v and a, the output is problematic. It looks to be fitting a threshold for every trial in the dataset. This has happened with my own data and I have now reproduced it with the included Cavanagh data. It was happening initially with more complex models, which I thought was the problem, but I have reproduced it with simpler models like the one below.

Any pointers for what I am missing or why this is happening and how to avoid it would be much appreciated!

*CODE TO REPRODUCE** import hssm import pandas as pd import arviz as az

hssm.set_floatX("float32")

from jax.config import config

config.update("jax_enable_x64", False)

cav_data = hssm.load_data('cavanagh_theta')

MODEL

model_spec = hssm.HSSM( model="ddm", data=cav_data, include=[ { "name": "v", "formula": "v ~ 1 + stim", "link": "identity", }, { "name": "a", "prior": { "Intercept": {"name": "Uniform", "lower": 0.01, "upper": 5.0}, }, "formula": "a ~ 1 + stim", "link": "identity", }, ], )

model_spec_infer = model_spec.sample() model_summary = pd.DataFrame(az.summary(model_spec_infer)) model_summary

OUTPUT Here is what the "model_summary" looks like:

      mean     sd  hdi_3%  hdi_97%  mcse_mean  mcse_sd  ess_bulk  ess_tail  r_hat

v_Intercept 0.232 0.033 0.172 0.294 0.000 0.000 4620.0 3126.0 1.0 v_stim[WL] 0.410 0.039 0.340 0.487 0.001 0.000 4333.0 3216.0 1.0 v_stim[WW] -0.066 0.045 -0.148 0.022 0.001 0.001 4855.0 3139.0 1.0 a_stim[WL] -0.041 0.019 -0.077 -0.006 0.000 0.000 3823.0 3472.0 1.0 a_stim[WW] -0.106 0.021 -0.143 -0.067 0.000 0.000 3755.0 3185.0 1.0 ... ... ... ... ... ... ... ... ... ... a[3983] 1.114 0.016 1.085 1.142 0.000 0.000 3843.0 3038.0 1.0 a[3984] 1.073 0.012 1.050 1.096 0.000 0.000 4251.0 3295.0 1.0 a[3985] 1.073 0.012 1.050 1.096 0.000 0.000 4251.0 3295.0 1.0 a[3986] 1.114 0.016 1.085 1.142 0.000 0.000 3843.0 3038.0 1.0 a[3987] 1.008 0.015 0.982 1.037 0.000 0.000 4259.0 3336.0 1.0

[3996 rows x 9 columns]

— Reply to this email directly, view it on GitHub https://github.com/lnccbrown/HSSM/issues/389, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAG7TFBHW5DPST4DYEIRWL3Y4VTOJAVCNFSM6AAAAABGAZGBN2VHI2DSMVQWIX3LMV43ASLTON2WKOZSGIZTMMBRGY4DQMA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

digicosmos86 commented 2 months ago

Hi John,

I second what Michael mentioned. There is also a convenience function in hssm that by default filters out these outputs. So instead of calling az.summary(), you can also call model_spec.summary(), which is a thin wrapper for az.summary() but by default filters out unnecessary output. We are working on actually preventing these values to be included in the InferenceData object to save space.

Thanks! Paul

AlexanderFengler commented 2 months ago

Hi @clithero , just as an addendum to the comments of @frankmj and @digicosmos86, we are also working on some improvements of defaults in this regard so that one can generally turn the collection of trial wise parameters on and off at will. PyMC recently came out with a little piece of functionality that can be used to that effect.

Best, Alex