Specifying priors for categorical variables in regression does not work

eort commented 3 months ago

Describe the bug

When running a (hierarchical) regression model and I try to specify priors for categorical variables I receive following error: TypeError: Wrong number of dimensions: expected 1, got 0 with shape (). (full stack trace below). I only tried this for the angle model

HSSM version

0.2

To Reproduce

import hssm
from pandas import read_csv

data = read_csv('cavanagh_theta_nn.csv')
model = hssm.HSSM(data=data,
                  model='angle',
                  loglik='angle.onnx',
                  loglik_kind='approx_differentiable',
                  include=[
                    {
                        "name": "v",
                        "formula":"v ~ (1 | participant_id) + conf",
                        "prior": {
                            "Intercept": {"name": "Uniform", "lower": -1, "upper": 1, "initval":0},
                            "conf": {"name": "Normal", "mu": 0, "sigma": 1, "initval":0}
                        }
                    }
                ],
                  hierarchical=True,
                  link_settings='log_logit'
)

Full stack trace: Traceback (most recent call last):

  File "/gpfs/project/ort/lorasick/ddm/code/hssm_reg.py", line 50, in <module>
    model = hssm.HSSM(data=data,
            ^^^^^^^^^^^^^^^^^^^^
  File "/gpfs/project/projects/bpsydm/tools/pyEnvs/hssm/lib/python3.11/site-packages/hssm/hssm.py", line 334, in __init__
    self.set_alias(self._aliases)
  File "/gpfs/project/projects/bpsydm/tools/pyEnvs/hssm/lib/python3.11/site-packages/hssm/hssm.py", line 594, in set_alias
    self.model.build()
  File "/gpfs/project/projects/bpsydm/tools/pyEnvs/hssm/lib/python3.11/site-packages/bambi/models.py", line 350, in build
    self.backend.build(self)
  File "/gpfs/project/projects/bpsydm/tools/pyEnvs/hssm/lib/python3.11/site-packages/bambi/backend/pymc.py", line 70, in build
    self.components[name].build(self, spec)
  File "/gpfs/project/projects/bpsydm/tools/pyEnvs/hssm/lib/python3.11/site-packages/bambi/backend/model_components.py", line 60, in build
    self.build_common_terms(pymc_backend, bmb_model)
  File "/gpfs/project/projects/bpsydm/tools/pyEnvs/hssm/lib/python3.11/site-packages/bambi/backend/model_components.py", line 98, in build_common_terms
    coef, data = common_term.build(bmb_model)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/gpfs/project/projects/bpsydm/tools/pyEnvs/hssm/lib/python3.11/site-packages/bambi/backend/terms.py", line 58, in build
    coef = distribution(label, dims=dims, **args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/gpfs/project/projects/bpsydm/tools/pyEnvs/hssm/lib/python3.11/site-packages/pymc/distributions/distribution.py", line 316, in __new__
    rv_out = model.register_rv(
             ^^^^^^^^^^^^^^^^^^
  File "/gpfs/project/projects/bpsydm/tools/pyEnvs/hssm/lib/python3.11/site-packages/pymc/model/core.py", line 1294, in register_rv
    self.set_initval(rv_var, initval)
  File "/gpfs/project/projects/bpsydm/tools/pyEnvs/hssm/lib/python3.11/site-packages/pymc/model/core.py", line 1110, in set_initval
    initval = rv_var.type.filter(initval)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/gpfs/project/projects/bpsydm/tools/pyEnvs/hssm/lib/python3.11/site-packages/pytensor/tensor/type.py", line 241, in filter
    raise TypeError(
TypeError: Wrong number of dimensions: expected 1, got 0 with shape ().

digicosmos86 commented 2 months ago

Hi @eort,

What version of cavanagh_theta_nn.csv are you using, so that I can reproduce this error? Please note that the version that comes from HDDM is not compatible with HSSM, due to the coding of response and the naming of participant_id. You can use hssm.load_data("cavanagh_theta") to load a version that's compatible with HSSM.

Without actually running the code, it seems that the error is with setting initial values. Does the error still happen when initvals are removed from your model specification?

Thanks! Paul

eort commented 2 months ago

Hi @digicosmos86,

What version of cavanagh_theta_nn.csv are you using

The one in hssm/datasets. The problem originally occurred with my own data, so I tried to reproduce with the standard dataset so you can have a look yourself.

Does the error still happen when initvals are removed from your model specification?

No, it doesn't! After I left out initvals from the model specifications, the model compiles fine

digicosmos86 commented 2 months ago

Hi @eort,

It seems that after removing the parenthesis around 1|participant_id, the model compiled fine for me. Perhaps parentheses has special meaning in bambi. Moving conf before (1|participant_id) also helped. So for me, the formula looks like this: v ~ conf + 1|participant_id.

@tomicapretto Could you share some insights?

Thanks! Paul

eort commented 2 months ago

Hi @digicosmos86,

Thanks for looking into this!

It seems that after removing the parenthesis around 1|participant_id, the model compiled fine for me

I'm not 100% sure, but when I try this (parentheses on/off), I get different models. No parentheses won't include random effects in the model. When I look into the inference data, there are no v_1 | participant_id nodes. And also when looking directly at the model it looks like this.

See here with parentheses (output shortened):

In [1]: model
Out[1]: 
(Hierarchical Sequential Sampling Model
 Model: angle
 v:
     Formula: v ~ coh + (1|participant_id)
     Priors:
         v_Intercept ~ Uniform(lower: -1.0, upper: 1.0, initval: 0.0)
         v_coh ~ Normal(mu: 0.0, sigma: 1.0, initval: 0.0)
         v_1|participant_id ~ Normal(mu: 0.0, sigma: Weibull(alpha: 1.5, beta: 0.3))
     Link: Generalized logit link function with bounds (-3.0, 3.0)
     Explicit bounds: (-3.0, 3.0)

And here without:

Hierarchical Sequential Sampling Model
Model: angle

v:
    Formula: v ~ coh + 1|participant_id
    Priors:
        v_Intercept ~ Uniform(lower: -1.0, upper: 1.0, initval: 0.0)
        v_coh ~ Normal(mu: 0.0, sigma: 1.0, initval: 0.0)
    Link: Generalized logit link function with bounds (-3.0, 3.0)
    Explicit bounds: (-3.0, 3.0)

digicosmos86 commented 2 months ago

Hi @eort,

Thanks for catching that! I think for now the best way to specify the formulas is parameter ~ fixed effects + (random effects). The parentheses do change how the models are built, and whether you have spaces around the | notation also changes things. We recommend that you keep the parentheses around the random effects. We have updated our documentation to highlight that.

Thanks for bring this to our attention! Paul

eort commented 2 months ago

Sounds good. But to be clear, like you said in your first post, one should also not specify the init_val, right? At least that is the one consistent thing that throws the error I reported initially.

digicosmos86 commented 2 months ago

Sounds good. But to be clear, like you said in your first post, one should also not specify the init_val, right? At least that is the one consistent thing that throws the error I reported initially.

It is unclear to me right now how Bambi deals with initval, but you can definitely still specify initial values if that helps with sampling. I had the most luck with this specification v ~ C(conf) + (1|participant_id). C() (upper case C) is an operator that explicitly specifies that conf is a categorical variable. Then I proceeded with a prior with initval, and the model also compiled fine. I will need to check with the Bambi team to figure out why initval causes an error when C() is left out.

tomicapretto commented 2 months ago

@eort @digicosmos86 sorry for the delay. I think the problem is related to Bambi not handling correctly the shapes of the parameters of the prior and the initval. I'll run a quick example now.

Update I have to run now, but regarding the parenthesis discussion, you have to include the parenthesis for group-specific terms.

digicosmos86 commented 2 months ago

Thank you @tomicapretto for looking into this!

eort commented 2 months ago

but you can definitely still specify initial values if that helps with sampling.

Perhaps this is due to the way that I specify these initvals, but I can't specify it without seeing the error above. The problem might be, that I not just specify a categorical regressor, but also specify the order in which the levels should be considered (I have a clear baseline condition that should be treated as a baseline). So on my own dataset I do this:

    "include":[
                    {
                        "name": "a",
                        "formula":"a ~ C(drug, levels=lvl) + (1 | participant_id)",
                        "prior": {
                            "Intercept": {"name": "Normal", "mu": 0.5, "sigma": 0.5, "initval":0.5},
                            "C(drug, levels=lvl)": {"name": "Normal", "mu": 0, "sigma": 1, "initval":0.5}
                        }
                ]

For the cavanagh dataset that should be something like:

                    {
                        "name": "v",
                        "formula":"v ~ (1 | participant_id) +  C(conf, levels=lvl)",
                        "prior": {
                            "Intercept": {"name": "Uniform", "lower": -1, "upper": 1, "initval":0},
                            "conf": {"name": "Normal", "mu": 0, "sigma": 1, "initval":0}
                        }
                    }

where lvl then should be provided in the model via extra_namespace={"lvl": <levels of conf>}

lnccbrown / HSSM

Specifying priors for categorical variables in regression does not work #387