BUG: AttributeError: module 'pymc' has no attribute 'diagnostics'

moxtoby commented 9 months ago

Describe the issue:

I have been running Gelman Rubin convergence tests after running the HDDM function. and they were working fine up until 13th October 2023. I have in the last week tried to run the same script again in Jupyter Notebook (from Anaconda) and after the HDDM iterations are complete, the Gelman Rubin test fails, citing the reason as "AttributeError: module 'pymc' has not attribute 'diagnostics'. I have not changed the code or the python environment. I have however updated the Anaconda software version recently, but not sure if this would have had an effect. I have tried to rebuild the python environment multiple times, and compared the packages list/versions, but I can't see anything has changed. The reason I am using an older version of Python and pymc is because they are compatible with HDDM 0.8.0. I have attached the conda list to show the packages and versions in the python environment.

hddm conda list.docx

Reproduceable code example:

from patsy import dmatrix 
import numpy as np         
from pandas import Series 
import matplotlib.pyplot as plt
import pymc as pm
#import arviz as az
import hddm

mydata = hddm.load_csv('JLM_stroopdata.csv')

models = []
for i in range(3):
    StroopModel1=hddm.HDDMRegressor(mydata,"v~C(cond,Treatment('Con'))",group_only_regressors=False,p_outlier=.05)
    StroopModel1.find_starting_values()
    StroopModel1.sample(2000, burn=200)
    models.append(StroopModel1)
from kabuki.analyze import gelman_rubin
gelman_rubin(models)

Error message:

Adding these covariates:
['v_Intercept', "v_C(cond, Treatment('Con'))[T.Inc]"]

C:\Users\m_oxt\anaconda3_newnew\envs\hddm2\lib\site-packages\scipy\optimize\optimize.py:2116: RuntimeWarning: invalid value encountered in double_scalars
  tmp2 = (x - v) * (fx - fw)

 [-----------------100%-----------------] 2000 of 2000 complete in 114.3 secAdding these covariates:
['v_Intercept', "v_C(cond, Treatment('Con'))[T.Inc]"]

C:\Users\m_oxt\anaconda3_newnew\envs\hddm2\lib\site-packages\scipy\optimize\optimize.py:2116: RuntimeWarning: invalid value encountered in double_scalars
  tmp2 = (x - v) * (fx - fw)

 [-----------------100%-----------------] 2000 of 2000 complete in 115.4 secAdding these covariates:
['v_Intercept', "v_C(cond, Treatment('Con'))[T.Inc]"]

C:\Users\m_oxt\anaconda3_newnew\envs\hddm2\lib\site-packages\scipy\optimize\optimize.py:2116: RuntimeWarning: invalid value encountered in double_scalars
  tmp2 = (x - v) * (fx - fw)

 [-----------------100%-----------------] 2000 of 2000 complete in 116.7 sec

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-4-ca3fd1f2572c> in <module>
      6     models.append(StroopModel1)
      7 from kabuki.analyze import gelman_rubin
----> 8 gelman_rubin(models)

~\anaconda3_newnew\envs\hddm2\lib\site-packages\kabuki\analyze.py in gelman_rubin(models)
    154             samples[i, :] = model.nodes_db.loc[name, "node"].trace()
    155 
--> 156         R_hat_dict[name] = pm.diagnostics.gelman_rubin(samples)
    157 
    158     return R_hat_dict

AttributeError: module 'pymc' has no attribute 'diagnostics'

PyMC version information:

I have attached the conda list below. hddm conda list.docx

Context for the issue:

I am using HDDM to estimate parameters for three tasks (Stroop, Go/NoGo and Stop Signal tasks), and needs to use the Gelman-Rubin convergence test to ensure the parameter estimation iterations converge. As the Gelman-Rubin test was working up until mid-October, I am relying on it to ensure my experiment analysis can be completed. I do not know of any other way to do the Gelman-Rubin test without using the pymc package. Any ideas why this error occurs now (and not before), and what I can do to work around it? Thank you.

welcome[bot] commented 9 months ago

:tada: Welcome to PyMC! :tada: We're really excited to have your input into the project! :sparkling_heart:
If you haven't done so already, please make sure you check out our Contributing Guidelines and Code of Conduct.

moxtoby commented 9 months ago

Additional information on how I installed the packages from Anaconda CMD.exe Prompt: Anaconda prompt (Create HDDM environment): conda create -n hddm python=3.6.10 conda activate hddm conda install pandas patsy conda install pymc==2.3.7 conda install statsmodels pip install kabuki conda install cython #have also tried pip install cython pip install HDDM==0.8.0
pip install pandas==0.20.0

Finally, I have tried to use both gelman_rubin(models) and hddm.analyze.gelman_rubin(models). Both fails with the same error stating AttributeError: module 'pymc' has no attribute 'diagnostics'

Thank you.

twiecki commented 9 months ago

I transferred this to the pymc2 repo, but I don't think you'll have much luck. Can you upgrade to HSSM?

moxtoby commented 9 months ago

Hello Thomas,

Thank you for your message and I understand that pymc (2.3.7) is an old package/version, so unlikely to get support. I have been trying out HSSM (0.1.5) for the past couple of weeks, and the reason I haven't transitioned to it is due to me being unable to get the correct HSSM syntax to estimate parameters in the Stroop, Go/Nogo and Stop Signal go Tasks.

I have submitted my issue with the Stop Signal Go tasks in the thread https://github.com/lnccbrown/HSSM/issues/324

As for the Stroop task, I cannot seem to get similar values of v and a as those from HDDM. My HDDM code to run the Stroop task is as follows: m=hddm.HDDMRegressor(mydata,"v~C(cond,Treatment('Con'))",group_only_regressors=False,p_outlier=.05) m.sample(10000, burn=1000)

This gives me the 2 drift rates (one each for the congruent and incongruent conditions) and the boundary separation for each participant. I input data from all participants together, using the subj_idx column to identify the participants.

In HSSM, the code I used to try to get the output for one participant is as follows: L_stroop_model = hssm.HSSM(data=df_t, model = 'ddm', hierarchical = False, include=[{"name":"v", "prior":{"name":"Uniform", "lower":-8.0, "upper": 8.0}, "formula":"v~ 0+(1|cond)", },{"name":"a", "prior":{"name":"Uniform", "lower":0.2, "upper": 5.0},},{"name":"t", "prior":{"name":"Uniform", "lower":0.01, "upper":1.0},},],) L_stroop_model.sample()

However, the output doesn't always converge (RHat > 1.1), and I sometimes have to run multiple iterations to make sure it converges. The estimated parameters are not close enough to those from HDDM to give me confidence.

To try to run multiple participants' data together (combining participants' data and adding the subj_idx column to identify the participants), I used the following code: JLM_stroop_model = hssm.HSSM(data=df, model = 'ddm', hierarchical = True, include=[{"name":"v", "prior":{"name":"Uniform", "lower":-8.0, "upper": 8.0}, "formula":"v~ 0+(1|cond) + (1|subj_idx)", },{"name":"a", "prior":{"name":"Uniform", "lower":0.2, "upper": 5.0}, "formula":"a~ 0+(1|subj_idx)",},{"name":"t", "prior":{"name":"Uniform", "lower":0.01, "upper":1.0}, "formula":"t~ 0+(1|subj_idx)",},],) JLM_stroop_model.sample()

The output from the above HSSM code indicates non-convergence (RHat much above 1.1) and the parameters are not at all similar to those from HDDM.

As for Go/NoGo, I am not sure if HSSM will handle the -999 rt value for successful withholding of a nogo trial. I also am not sure what the HSSM syntax would be for the equivalent of HDDM syntax:

m=hddm.HDDMStimCoding(mydata,depends_on={'v':'condition'},include='z', stim_col='condition',split_param='z', p_outlier=0.05) m.sample(10000, burn=1000)

Can you help me with getting the correct equivalent HSSM syntax for the 3 tasks so I can transition from HDDM to HSSM with my study? Or should I open up new query/bug in the HSSM Github page for my Stroop and Go/NoGo queries above?

Many thanks for your time. Kind regards, Michelle

twiecki commented 9 months ago

Yes, best to move this to HSSM.

pymc-devs / pymc2