Problems with readme and codes

zashwood / glm-hmm

Code to reproduce figures for "Mice alternate between discrete strategies during perceptual decision-making" from Ashwood, Roy, Stone, IBL, Churchland, Pouget and Pillow (2021)

42 stars 23 forks source link

Hello, I am trying to reproduce the figures and encounter some problems:

The readme says "The lapse model can be fit at any time", but I find that glm-hmm/2_fit_models/fit_global_glmhmm/2_apply_post_processing.py requires lapse_model_params_one_param.npz and lapse_model_params_two_param.npz. So maybe we have to follow this order: fit_glm -> fit_lapse_model -> fit_global_glmhmm -> fit_individual_glmhmm?
I notice that in the cluster array of fit_global_glmhmm, there are 20 iterations for each fold and each K. I want to double check if I am supposed to run all 20 iterations here as those take a long time to run.
The readme says to reproduce figures for another two datasets, "replace the IBL URL in the preprocessing pipeline scripts" and rerun the fit models part. However, after downloading the another two datasets, I find their structure is quite different from the IBL one and doesn't seem to be compatible with current preprocess script. Therefore, I am wondering if you can release the preprocess scripts for another two dataset.

Would really appreciate if you can help me with those problems!

Hi Jingyang, thanks for your interest in using the code, and for your message. To respond to each of your points:

That's a good point about the lapse model fits being called in the post-processing script. While the GLM has to be fit before the global GLM-HMM (due to being used to initialize the global GLM-HMM), the lapse model fits are only used for model comparison purposes - hence my comment about being able to run the lapse model code at any time. But I agree that, given the current version of the post-processing code, the lapse model should be run either before or after the global GLM-HMM so that this script doesn't throw an error. I just updated the README, so hopefully this is clearer now.
Yes - for the global fit, we use 20 initializations for each fold-K combination. While I think it would be acceptable to run the fits for fewer folds, I do think that it's important to use multiple initializations, so as to prevent yourself from getting stuck in a local minimum during fitting. I agree that this is computationally expensive, and we typically ran our code on a cluster, launching a separate job for each initialization-K-fold combination.
I'll see what I can do re: releasing the preprocessing code for the other two datasets. Given that the analyses we apply to the other two datasets are so similar to those for the IBL dataset, I didn't think it would be of sufficient interest to release the whole pipeline for these datasets too, and I thought having these additional scripts might clutter the repo. But I'll do my best to clean up the preprocessing scripts, and add them to the 1_preprocess_data directory.

zashwood / glm-hmm

Problems with readme and codes #1