saudiwin / idealstan

idealstan offers item-response theory (IRT) ideal-point estimation for binary, ordinal, counts and continuous responses with time-varying and missing-data inference. Latent space model also included. Full and approximate Bayesian sampling with 'Stan' (www.mc-stan.org).
https://cran.r-project.org/web/packages/idealstan/index.html
GNU General Public License v2.0
50 stars 12 forks source link

fixing modes with variational inference #29

Open michalovadek opened 1 month ago

michalovadek commented 1 month ago

coming back to this package after a few years away and pleased to see that Bayesian inference in R remains as janky as I remember it (jkjk). I have a simple question: does fixtype = "vb_full" work in the current development version of the package? (using develop due to the Stan array definition deprecation)

I have the following setup:

# make data
data_ideal <- idealstan::id_make(score_data = cases_pos_data,
                                 person_id = "actor",
                                 item_id = "case",
                                 model_id = "1",
                                 outcome_disc = "position")

# estimate model
estimate_ideal <- idealstan::id_estimate(
  idealdata = data_ideal,
  model_type = 1,
  vary_ideal_pts = "none",
  fixtype = "vb_full",
  nchains = 8,
  ncores = 6
)

and I'm getting the following error message when running the id_estimate call:

Model executable is up to date! [1] "(First Step): Estimating model with variational inference to identify modes to constrain." Error: Missing input data for the following data variables: num_restrict_high, num_restrict_low. In addition: Warning message: In max(Y_cont) : no non-missing arguments to max; returning -Inf

but specifying num_restrict_ in the function call is not possible. Using the latest version of cmdstanr (2.35.0). Any pointers would be appreciated

saudiwin commented 1 month ago

Hi Michal -

The package has changed quite a bit since you last used it, mostly for the better, I think. I don't really support variational inference anymore, although I recently added Pathfinder and Laplace approximation, which both work better. I just got it set up to first run Pathfinder/Laplace to find initialization, and then run a Stan model. However, I don't currently have the old "auto-ID" implemented where you can run it without any info on which persons/items to constrain. So you need at least 1 person or item to constrain.

If you want, you can attach a sample dataset and I can write some code to use it with the new version. I am still some months away from releasing a fully documented new version.

michalovadek commented 1 month ago

thanks, I turned to manual constraining which seems to work fine, although I am not completely clear on the impact the manual choice of the constrained person has on other estimates (if any)

I do think a good ideal point IRT model is core polsci methods infrastructure, so I would be happy to contribute to the development (are you still planning on adding support for 2 dimensions?). I'm also hoping for a bit more stability going forward with the clarification of the situation around cmdstanr and CRAN

saudiwin commented 1 month ago

Hi Michal -

yes I am certainly open to any help on finishing the package (haha) or adding features. A big part of the reason it hasn't been finished is that it's been very difficult resolving all the issues with identification. This is still an open area in statistical inference and I have become a bit of a zealot about it :). But I am finally satisfied as I now use some variational algorithms (Pathfinder/Laplace) to resolve remaining ID issues. This should also permit 2-D estimation or even higher dimensions, so if you want to work on adding that, feel free.

I've also come around to the idea that an IRT model should have informative constraints for it to be useful. You don't need to be 100% confident about what to pin where, and you can use a single pin (i.e. one person/item to be positive) in some cases if you're unsure. But letting the model select both which items/persons to constrain as well as the latent variable creates some redundancy that should be avoided I think. You can always try PCA if you just want to explore the variance.

There's also a lot of need for documenting (a lot of the vignettes etc. are out of date) and I want to add in further support for covariates and ideal point marginal effects. The package is more or less feature complete but it will take time to have it all in a nice CRAN-worthy box. Anything you want to work on is fine with me, just coordinate it so I know what you're doing. To be safe we could have you work on a separate branch.

michalovadek commented 1 month ago

do you mean that the develop branch is up to date in terms of all your work thus far? I could then work off there on a fork. Ideally would avoid duplication though

Most imminently I would probably seek to "clean up" the package: discard unsupported features, implement more input checks, fail more gracefully and informatively, look into some of the errors I have been encountering while trying to use it, update the documentation accordingly, etc.

like adding a second dimension, marginal effects would be an excellent addition

saudiwin commented 3 weeks ago

Hi Michal -

yes for sure you should work off of the develop branch. the package definitely needs more checks as you note. probably best to start there & with documentation for now. the package is not ready atm for a 2nd dimension as I'm finishing up new inference algorithms, including pathfinder/laplace/better inits. For marginal effects, I have already implemented a lot of that and am intending to do that next.

just try to commit frequently / add documentation to the commits so I can follow along. i'll add you to the repo.

thanks!