lnccbrown / HSSM

Development of HSSM package
Other
77 stars 11 forks source link

Different parameters when entering single participant data vs multiple participant data in HSSM #324

Closed moxtoby closed 10 months ago

moxtoby commented 10 months ago

Hi there,

I am just transitioning to HSSM from HDDM. I have rt and accuracy data from a stop signal task with 3 participants. I have changed the response column to reflect 1 and -1, and rt in seconds. When I have the 3 participants' data in a single file, with an extra subj_idx column to reflect participants 1, 2 and 3, the parameter recovered for each participant (a, v, t) are different than when I run the participants separately in separate files. The parameters recovered from the grouped data file doesn't make sense (negative t and a values). The parameters recovered from the separate participant files are similar to those recovered from HDDM, but not similar enough to closely align to those calculated in HSSM. I have included my HSSM codes below. What am I doing wrong? Many thanks for your help.

The HSSM is version 0.1.5.

The HSSM code to run the 3 participants together in a combined data file is the following: JLM_SST_Model = hssm.HSSM(data=df_t, model = 'ddm', hierarchical = True, include=[ {"name":"v", "prior":{"name":"Uniform", "lower":-8.0, "upper": 8.0} , "formula":"v~ 0+(1|subj_idx)",}, {"name":"a", "prior":{"name":"Uniform", "lower":0.2, "upper": 5.0}, "formula":"a~ 0+(1|subj_idx)",}, {"name":"t", "prior":{"name":"Uniform", "lower":0.01, "upper": 1.0}, "formula":"t~ 0+(1|subj_idx)",},],) JLM_SST_Model.sample()

I used the 0 + (1|subj_idx) so to calculate parameters separately for each participant. I hope I am using the correct syntax? I don't know if the "hierarchical = True" option makes a difference.

The HSSM code to run the 3 participants separately is the following: L_SST_Model = hssm.HSSM(data=df_t, model = 'ddm', hierarchical = False, include=[ {"name":"v", "prior":{"name":"Uniform", "lower":-8.0, "upper": 8.0},}, {"name":"a", "prior":{"name":"Uniform", "lower":0.2, "upper": 5.0},}, {"name":"t", "prior":{"name":"Uniform", "lower":0.01, "upper":1.0},},],) L_SST_Model.sample()

The RHat values when running the participant files separately gives values less than 1.1. However, the RHat values are very large (greater than 1.1) when running the combined data file for all 3 participants. Should I be doing something different to ensure convergence?

For context, in HDDM, the v, a and t values for the 3 participants (running together) are the following (running 20000 samples, converged): v: 5.79, 2.53, 7.57 a: 4.63, 1.46, 9.96 t: 0.08, 0.242, 0.020

In HSSM, when running separately, the v, a and t values are as follows (RHat < 1.1): v: 5.32, 3.13, 7.37 a: 2.17, 0.751, 3.61 t: 0.116, 0.219, 0.043

FYI the HDDM code for running the 3 participants together is the following: SSTModel = hddm.HDDM(mydata) SSTModel.find_starting_values() SSTModel.sample(20000, burn=2000)

With the above HDDM code, I get each participant's a, v, and t values. I am trying to get the equivalent in HSSM, but I don't think my HSSM code is correct.

Thanks in advance for your help.

Finally, when I run the HSSM code with the combined data file, I get lots of t[xx] and a[xx] rows. I remember reading in another thread that these rows can be ignored? Is this right?

Thank you.

Michelle

AlexanderFengler commented 10 months ago

Hi Michelle,

sorry for the late response on this one, I was out for a bit.

Let me first mention that the a parameters one should expect from HSSM should be roughly 1/2 of the a parameters HDDM produces. This is because in HSSM all models consistently treat the bounds as [-a, a] whereas, in legacy to a particular algorithm for likelihood computation, in HDDM the boundaries where treated as [0,a].

Moreover, if you use the standard HDDM class without further arguments, then you would be using somewhat informative priors, which will misalign results to a degree as well. (Note, in the next version of HSSM, we will make such prior as default choices available).

Concerning the t[xx], a[xx] rows, these are trial wise parameters for the respective t, a parameters. You will likely want to ignore them, they can be useful sometimes however.

As I mentioned in my response to your other issue, try you can try the loglik_kind = "blackbox" argument here as well, to use the old HDDM likelihood computations. Any discrepancy between results should derive from the difference in priors when using this setting.

Best, Alex

moxtoby commented 10 months ago

Thank you Alex. I will have a go at your suggestion on adding loglik_kind = "blackbox" and report back after the weekend. Many thanks again.

Best wishes, Michelle

moxtoby commented 10 months ago

Hi Alex,

Thank you for the info on the a parameter difference between HSSM and HDDM, and also what t[xx] and a[xx] are for. I have tried to add the loglik_kin="blackbox" option to both Stroop and SST Go trials dataset. The results are mixed. I have attached the results of my dataset in the attached spreadsheet (the same one I attached to my other query).

However, I am still unsure how to enter multiple participants' data at once in HSSM. What do I need to do in terms of syntax to be able to get each subject's a, v, t and z parameters? My code is in the original query above.

Many thanks again. Best wishes, Michelle Compare HDDM HSSM.xlsx

moxtoby commented 10 months ago

Hi Alex, I have just worked out how to successfully input multiple participants into HSSM and generate parameters that are equivalent to those of HDDM. Thank you for your help so far. This issue can be closed.

Many thanks Michelle