marcdotson / conjoint-ensembles

Using clever randomization and ensembling strategies to accommodate multiple data pathologies in conjoint studies.
MIT License
0 stars 1 forks source link

Running the ensemble for two or more pathologies jointly #64

Closed marcdotson closed 3 years ago

marcdotson commented 3 years ago

All previous work has been merged together with quality-of-life improvements added, as detailed in PR #61.

The joint-pathologies branch is to, as it suggests, get the conjoint ensemble running on the pathologies jointly (ANA and screening to start with).

marcdotson commented 3 years ago

@jeff-dotson FWIW, I've tested your heterogeneous and homogeneous pathologies for 400, 1000, and 2000-member ensembles for ANA only. @RogerOverNOut interestingly, it looks like the weights, now appended to ensemble_fit (new model output loaded to the shared Drive folder) continue to heavily weight the final ensemble member, even though I'm now randomly drawing 400 or 1000 ways to induce the pathology on the betas from the 2000 total in the simulated data. In other words, if there was any "signal" in the final ensemble member previously, that is no longer the case.

Here's the homogeneous, 400-member ensemble results:

Model LOO Hit Rate Hit Prob
HMNL -2861 0.460 0.397
Ensemble -2922 0.458 0.381

Here's the homogeneous, 1000-member ensemble results:

Model LOO Hit Rate Hit Prob
HMNL -2861 0.460 0.397
Ensemble -2913 0.458 0.383

Here's the homogeneous, 2000-member ensemble results:

Model LOO Hit Rate Hit Prob
HMNL -2861 0.460 0.397
Ensemble -2928 0.461 0.380

Here's the heterogeneous, 400-member ensemble results:

Model LOO Hit Rate Hit Prob
HMNL -2985 0.479 0.379
Ensemble -3028 0.474 0.365

Here's the heterogeneous, 2000-member ensemble results:

Model LOO Hit Rate Hit Prob
HMNL -2985 0.479 0.379
Ensemble -3031 0.472 0.365

Beyond a possible issue with the ensemble weights, I don't know if this says much about heterogeneous vs. homogeneous pathologies or even the ensemble size, but I thought I'd place it here as further evidence for what we know already: This ensemble does just as well as HMNL for ANA but will really shine when we get to joint pathologies and then real data -- where the the data is genuinely pathological with respect to the HMNL. I should have something to share by Friday.

marcdotson commented 3 years ago

To illustrate the problem with the current ensemble weights, here is a quick plot of the weights by ensemble member for the above models.

For the homogeneous, 400-member ensemble:

image

For the homogeneous, 1000-member ensemble:

image

For the homogeneous, 2000-member ensemble:

image

For the heterogeneous, 400-member ensemble:

image

For the heterogeneous, 1000-member ensemble:

image

For the heterogeneous, 2000-member ensemble:

image

marcdotson commented 3 years ago

Ideas to address the ensemble weights problem:

marcdotson commented 3 years ago

@jeff-dotson @RogerOverNOut, as promised, using the ANA-only ensembles, here are equal weights and dropping the last member and renormalizing compared with loo overweighting the last member:

Here's the homogeneous, 400-member ensemble results:

Model LOO Hit Rate Hit Prob
HMNL -2861 0.460 0.397
Ensemble -2922 0.458 0.381
Ensemble (Equal Weights) -2931 0.458 0.380
Ensemble (Renormalized) -2903 0.458 0.379

Here's the homogeneous, 1000-member ensemble results:

Model LOO Hit Rate Hit Prob
HMNL -2861 0.460 0.397
Ensemble -2913 0.458 0.383
Ensemble (Equal Weights) -2931 0.457 0.380
Ensemble (Renormalized) -2670 0.458 0.347

Here's the homogeneous, 2000-member ensemble results:

Model LOO Hit Rate Hit Prob
HMNL -2861 0.460 0.397
Ensemble -2928 0.461 0.380
Ensemble (Equal Weights) -2930 0.454 0.380
Ensemble (Renormalized) -2899 0.462 0.376

Here's the heterogeneous, 400-member ensemble results:

Model LOO Hit Rate Hit Prob
HMNL -2985 0.479 0.379
Ensemble -3028 0.474 0.365
Ensemble (Equal Weights) -3032 0.475 0.364
Ensemble (Renormalized) -2976 0.475 0.360

Here's the heterogeneous, 1000-member ensemble results:

Model LOO Hit Rate Hit Prob
HMNL -2985 0.479 0.379
Ensemble -3016 0.458 0.369
Ensemble (Equal Weights) -3033 0.472 0.364
Ensemble (Renormalized) -2621 0.476 0.319

Here's the heterogeneous, 2000-member ensemble results:

Model LOO Hit Rate Hit Prob
HMNL -2985 0.479 0.379
Ensemble -3031 0.472 0.365
Ensemble (Equal Weights) -3033 0.472 0.364
Ensemble (Renormalized) -3016 0.472 0.363

So it doesn't do much, does it? I mean, the LOO fit changes most. We should still investigate, but I'm going to move on to getting the joint ensemble working knowing we can use equal weights or renormalize and essentially get the same results.

marcdotson commented 3 years ago

@jeff-dotson @RogerOverNOut this is a temporary stop-gap, but here are some results for ANA and screening jointly with equal weights where the members that have had the ELBO error have been dropped:

Here's the heterogenous, 200-member (actually 175-member after dropping) ensemble results:

Model LOO Hit Rate Hit Prob
HMNL -2658 0.417 0.354
Ensemble (Equal Weights) -828430 0.406 0.351

Here's the heterogenous, 400-member (actually 400-member, none dropped) ensemble results:

Model LOO Hit Rate Hit Prob
HMNL -2658 0.417 0.354
Ensemble (Equal Weights) -807416 0.404 0.352

Here's the heterogenous, 1000-member (actually 625-member after dropping) ensemble results:

Model LOO Hit Rate Hit Prob
HMNL -2658 0.417 0.354
Ensemble (Equal Weights) -811617 0.399 0.349

So, uh, not great.

marcdotson commented 3 years ago

With the screening pathology, I consistently get Error in sampler$call_sampler(c(args, dotlist)): stan::variational::advi::calc_ELBO: OR stan::variational::normal_meanfield::calc_grad: The number of dropped evaluations has reached its maximum amount (100). Your model may be either severely ill-conditioned or misspecified. This behavior isn't present for the ANA pathology.

A few things to point out from Automatic Variational Inference in Stan:

I'm going to try and put together a joint ensemble with ANA and respondent quality for the time being and then I'll return to screening as needed.

marcdotson commented 3 years ago

@jeff-dotson @RogerOverNOut putting aside screening for a moment, here are the results for simulated data with both ANA and respondent quality estimated with a heterogenous, 1000-member ensemble:

Model LOO Hit Rate Hit Prob
HMNL -2958 0.504 0.386
Ensemble -3022 0.496 0.369
Ensemble (Equal Weights) -3027 0.497 0.368

The weighting indicates variety in the ensemble members (none of that final-model up-weighting):

image

But clearly we're missing something -- perhaps more variation in the clever randomization?

I'm still working to get this working on a real dataset. The same problems we saw when screening is present is there for real data, so I'm just letting it run with actual sampling instead of VB for now.

marcdotson commented 3 years ago

Inducing more variation in the clever randomization by:

marcdotson commented 3 years ago

After making sure ANA applies at the attribute level and fixing the number of attribute levels being hard-coded in clever_randomization(), along with inducing some more variation by randomizing the number of attributes to which ANA applies, here are the results for simulated data with both ANA and respondent quality estimated with a heterogenous, 1000-member ensemble:

Model LOO Hit Rate Hit Prob
HMNL -2953 0.488 0.379
Ensemble -3065 0.503 0.358
Ensemble (Equal Weights) -3073 0.50 0.356

The final-model up-weighting is back:

image

The model crashed that was running this on a real dataset. However, some of my issues may be compiler-specific.

marcdotson commented 3 years ago

Okay, I found another mistake in the code. We have never seen results that actually includes respondent quality as a pathology. Running the above again and starting on a detailed code review and documentation.

Sorry, not a great code maintainer yet.

marcdotson commented 3 years ago

Latest results for simulated data with both ANA and respondent quality estimated with a heterogenous, 1000-member ensemble (in this instance using sequential, full posterior sampling, each model with a single chain and thinned draws):

Model LOO Hit Rate Hit Prob
HMNL -2953 0.488 0.379
Ensemble -2234 0.333 0.344
Ensemble (Equal Weights) -2378 0.329 0.342

LOO appears to be doing its part:

image

marcdotson commented 3 years ago

Finally, results for real data where we account for both ANA and respondent quality. Again, it's a 1000-member ensemble using multiple weeks' worth of of sequential, full posterior sampling with single chains and thinned draws:

Model LOO Hit Rate Hit Prob
HMNL -2756 0.403 0.348
Ensemble (LOO Weights) -871 0.259 0.244
Ensemble (Equal Weights) -1263 0.245 0.247

All right, @jeff-dotson @RogerOverNOut that didn't take as long as I'd feared. The results are consistent with what we've seen -- and with real data that improvement in LOO with the ensemble weights is huge. Again, we aren't seeing it translated into predictive fit improvement, which again necessitates looking at other meta learners.

Oh, and, ruh roh:

image

I've updated the ensemble fit object in the shared folder, FWIW.

marcdotson commented 3 years ago

Closing out this issue with PR #67.