Evaluate the ensemble of hierarchical MNL models

marcdotson commented 5 years ago

Move to an ensemble of hierarchical MNL (use hmnl.stan).
Induce randomization on the training data WRT attribute non-attendance and screening (moved to #40).

dgmiller commented 5 years ago

@marcdotson Done. See dev/aws branch. The base model is named base_hmnl.stan. Initial results show that the ensemble of hierarchical models is effectively identical to the standard mnl_vanilla.stan model on all simulated data sets. The difference for each is as follows (dataset, performance difference between models)

01 +.00 same
02 -.02 worse
03 +.01 better
04 +.01 better

I think what's going on is that the ensemble of hbmnl models is just too conservative. An ensemble benefits from (over)confident, diverse base models rather than safe, "accurate" models. We could try randomizing over something other than the feature-levels but I seriously think that our next step is trying neural networks such as hierarchical bayesian neural networks. To me, everything we have been doing seems to be pointing in this direction.

marcdotson commented 5 years ago

@dgmiller be careful to not jump to the next thing too quickly. We have preliminary evidence that we should run an ensemble of HMNL. We can't say we've given it a fair shake until we've actually induced "clever" randomization -- which we haven't, not beyond something akin to ANA.

dgmiller commented 5 years ago

@marcdotson

After some preliminary experimentation, I concede your point. Just for fun, I built and ran a vanilla feedforward neural net, played around with various architectures, and ran it on different data sets. I just wanted to see if it could brute-force predict Y given X. The good news is that in general, it didn't outperform either the standard hmnl model or the ensemble. (Only on a few of the runs did it outperform on certain data sets by a small amount.) Still, thinking about the ensemble as a neural net might help point us toward effective "clever" randomization strategies. I can explain when we meet next.

marcdotson commented 5 years ago

@dgmiller is porting the ensemble code into R.

marcdotson commented 5 years ago

Next steps in evaluating the ensemble of hierarchical MNL models. Note that conjoint.py is the first attempt at randomization, running the ensemble, and saving model output.

@jeff-dotson to simulate data with one or more of our first two data pathologies: ANA and screening.
@jeff-dotson to implement clever randomization strategies. This includes:

Exploring whether we can apply constraints to parameters in Stan (e.g., for screening/non-compensatory behavior).
Allows for dealing with randomizing to address a single pathology or addressing two or more pathologies jointly.
Creating a list of lists with the type randomization/pathology (or pathologies) addressed, a key detailing which attributes and levels are affected, and the modified data.
As a reference to the preliminary work randomizing over model estimates:

58137484-af4d7500-7bef-11e9-81d5-796698787a5e

@marcdotson to confirm estimation for the conjoint ensemble, which includes:

Specifying generated quantities output to use loo.
Evaluating the needs for prior specification, specifically the possibility of using the posterior of a full model for use as the prior.
Starting with loo as the meta-learner (generated quantities block to include log_lik[n] = normal_lpdf()).

@marcdotson start with loo as the fit statistic for model comparison. Consider other forms of predictive fit as needed.

marcdotson commented 4 years ago

@jeff-dotson pull the latest changes and you're good to work in the randomization section of 03_conjoint-ensemble.R. Remember that so far we've only tried leaving out a single variable. Whatever you do, just make sure it's a modified X that is the result to input into the ensembles section next.

marcdotson commented 4 years ago

@RogerOverNOut when specifying the dimensions of the log likelihood saved for the screening (and eventually ANA) model, create an array with dimensions: number of post-warm-up iterations X number of chains X number of observations.

Since our custom MCMC is probably a single chain, that'll be a number of post-warm-up iterations X number of observations matrix.

RogerOverNOut commented 4 years ago

@marcdotson Thanks Marc...Will do.

RogerOverNOut commented 4 years ago

@jeff-dotson working on the code for the competing models. Screening is easy as we have the code and its just a matter of writing out the data transitions. If you have ANA code to share let me know and I will work it in, otherwise I could work on that early next week. I have significant chunks of time Mon and Tues.

marcdotson commented 4 years ago

@jeff-dotson @RogerOverNOut if you have questions on how to get your randomization and alternative model changes onto GitHub, let me know.

marcdotson commented 4 years ago

@dgmiller where is the code used to create the simulated data sets with the various pathologies?

dgmiller commented 4 years ago

@marcdotson It's in the python code folder. The file is utils.py.

The function is generate_simulated_data( ) which allows you to specify which pathologies from the pathology( ) function are present in the data.

marcdotson commented 4 years ago

Updated next steps in evaluating the ensemble of hierarchical MNL models (with conjoint.py as a reference).

@marcdotson simulating data without any pathologies and with one or more pathologies present, beginning with ANA and screening in 01_simulate_data.R (with generate_simulated_data() from utils.py as a reference).
@marcdotson cleaning real data to match how we've simulated data in 02_clean-data.R.
@jeff-dotson and @marcdotson implementing clever randomization strategies in 03_conjoint-ensemble.R. This includes:

Exploring whether we can apply constraints to parameters in Stan (e.g., for screening/non-compensatory behavior).
Allows for dealing with randomizing to address a single pathology or addressing two or more pathologies jointly.
Creating a list of lists with the type randomization/pathology (or pathologies) addressed, a key detailing which attributes and levels are affected, and the modified data.
As a reference to the preliminary work randomizing over model estimates:

58137484-af4d7500-7bef-11e9-81d5-796698787a5e

@marcdotson confirming estimation for the conjoint ensemble in 03_conjoint-ensemble.R, which includes:

Specifying generated quantities output to use loo (generated quantities block to include log_lik[n] = normal_lpdf()).
Evaluating the needs for prior specification, specifically the possibility of using the posterior of a full model for use as the prior.
Starting with loo as the meta-learner (i.e., Bayesian stacking).

@RogerOverNOut confirming the compatibility of loo for model comparison with the competing models in 04_competing-models.R.
@RogerOverNOut starting with loo as the fit statistic for model comparison in 05_model-comparison.R.

marcdotson commented 4 years ago

Updated next steps:

@marcdotson and @jeff-dotson simulating data without any pathologies and with one or more pathologies present, beginning with ANA and screening in 01_simulate_data.R (@RogerOverNOut has some code to share with pathologies present).
@jeff-dotson prepping real data and inducing randomization in 02_clean-data.R.
@marcdotson implementing randomization through parameter constraints (for single pathologies and pathologies jointly) and running the ensemble in 03_conjoint-ensemble.R, which includes:
@RogerOverNOut using loo as the meta-learner (i.e., Bayesian stacking) in 04_meta-learner.R.
@RogerOverNOut using loo as the fit statistic for model comparison after running competing models in 05_competing-models.R and comparing predictive fit in 06_model-comparison.R.

marcdotson commented 3 years ago

After a lot of work, we are close to an initial evaluation. Some final to dos for this far-too-large, meta-issue:

[x] Pass on data and output for running the meta learner and competing models.
[x] Run the meta-learner and compute fit (using loo and hit rates on the test data).
[ ] Run competing models on the simulated data.
[ ] Figure out running the ensemble using full posteriors as priors.
[ ] Figure out running the ensemble for joint pathologies.

marcdotson commented 3 years ago

@jeff-dotson @RogerOverNOut a swing and a miss for simulated data with ANA present, at least for predictive fit:

Model	LOO	Hit Rate	Hit Prob
HMNL	-2732	0.566	0.446
Ensemble	-13.4	0.567	0.402

We're still running the ANA-specific model on the simulated data, but we're obviously missing something. Before we dive in and start trying things, I think it would be wise to identify everything that we could modify and discuss what we should attempt first. Here's the potential changes I see:

Allow for pathological heterogeneity across simulated respondents (e.g., not everyone screens on the same attribute levels).
Simulate covariates for the upper-level model to allow for generating individual-level betas when computing predictive fit.
Check the randomization by running a model where we impose the specific restrictions that were used to generate the data, a member of the ensemble where we know exactly what the pathological behavior looks like.
Use the posterior from a full model as priors in the ensemble.
Increase the number of ensemble members/models.
Even without covariates, do something other than averaging betas for predictive fit (i.e., generate betas from upper-level parameters instead). Maybe even use loo for actual predictive fit?

Personally and from experience, I think the final three things should be tried first -- essentially everything but simulation changes. This branch is becoming a bit of a monster, so after we have the output from the ANA-specific model, I'd like a minute to finish cleaning up the code so iterating will be a bit easier, merge this branch, and then create separate branches for each of these attempts (i.e., ensemble-tuning and predictive-fit branches).

Thoughts?

marcdotson commented 3 years ago

Closed out the initial evaluation with PR #46.

marcdotson commented 3 years ago

Split tasks across three new branches and issues.

marcdotson / conjoint-ensembles

Evaluate the ensemble of hierarchical MNL models #38