lnccbrown / HSSM

Development of HSSM package
Other
71 stars 10 forks source link

Add to the tutorial how to save and load models #356

Closed igrahek closed 4 months ago

igrahek commented 4 months ago

Sorry if I missed this in the tutorial, but I can't find anything on how to save and load models. I was looking into arviz, and this flow seems to work, but I'm not sure that it's the best way. Also, it is ok for getting posteriors, plotting the traces, and doing model summary. However, it seems to break for the posterior predictives.

cav_data = hssm.load_data("cavanagh_theta")

model = hssm.HSSM(
    data=cav_data,
    hierarchical=True,
    prior_settings="safe",
    model="ddm",
    loglik_kind="approx_differentiable")

modelObject = model_safe.sample(chains=1, 
    cores=1, 
    draws=50, 
    tune=50)

# Save
modelObject.to_netcdf('model')

# Load
modelObject = az.InferenceData.from_netcdf('model')

# Summarize
az.summary(modelObject, var_names=['~a','~t', '~z'])

# Plot posteriors
az.plot_trace(model)

# Plot PPCs
az.plot_ppc(modelObject)
Error: `data` argument must have the group "posterior_predictive" for ppcplot

It seems to me that when saved and loaded in this way, we have the InferenceData object. Is there a better way to do all of this so that I'm using HSSM functions? For example this code breaks:

modelObject.summary()

AttributeError: 'InferenceData' object has no attribute 'summary'

For getting the HSSM plot_posterior_predictive function to work, do I need to supply the model, the InferenceData, and the raw data? Or is there a way to save modelObject so that all of that is within it?

kiante-fernandez commented 4 months ago

Hey! I spoke with Alex about this. So, I will share what I came up with from our discussion. The package does not seem to have a single function to do something like what you are asking just yet. To do what you want, you have to:

Here is a quick example of something like that.

import arviz as az
import hssm

def reattach(filename, data):
    # Load the InferenceData object
    inferd = az.from_netcdf(filename)
    # Reattach to the model
    m = hssm.HSSM(data=data)
    m._inference_obj = inferd
    return m

# Load data
cav_data = hssm.load_data("cavanagh_theta")
# Initialize model
model = hssm.HSSM(data=cav_data)
# Save the model state
model.to_netcdf('modelObj.nc')
# Reload the model
model = reattach("modelObj.nc", cav_data)

model.sample_posterior_predictive(data=cav_data)

Note the reinstatement is somewhat odd here, but nonetheless, some version of 'reattach' would be useful.

igrahek commented 4 months ago

Hey @kiante-fernandez great to see you here :) Thanks for the quick response! I was actually trying to do something similar, but I'm not able to save the model using to_netcdf. This function works for me when saving the inference data, but not the HSSM model object. Does that work on your end?

For me:

# Initialize model
model = hssm.HSSM(data=cav_data)
# Save the model state
model.to_netcdf('modelObj.nc')

Throws: AttributeError: 'HSSM' object has no attribute 'to_netcdf'

However,

# Initialize model
model = hssm.HSSM(data=cav_data)
# Sample
modelObject = model.sample()
# Save the inference data
modelObject.to_netcdf('modelObj.nc')

works fine

AlexanderFengler commented 4 months ago

I think the .to_netcdf() part should refer to the traces, not the model object itself here.

gpagnier commented 4 months ago

Hey, just confirming that saving/loading models does work using the following:

Initializing example model m1

m1 = hssm.HSSM( data=simData, model='angle', include=[ { "name": "v", "formula": "v ~ 1 + trialBenefits*cond", }, ], )

Acrtually sampling

m1Sampled=m1.sample( chains=2, cores=1, draws=1000, tune=1000,idata_kwargs=dict(log_likelihood=True), )

Saving the inference object

m1Sampled.to_netcdf("m1SampledSaved.netcdf4")

Loading the inference object

loadedModel1=az.from_netcdf("m1SampledSaved.netcdf4")

Attaching the inference object to original model

m1._inference_obj = loadedModel1