marcdotson / modeling-heterogeneity

Exploring covariates and models for preference heterogeneity.
MIT License
0 stars 0 forks source link

Validating models using car ownership and recontact survey data #11

Open marcdotson opened 3 years ago

marcdotson commented 3 years ago

Note that the validation data is car ownership data, not car purchase data.

marcdotson commented 3 years ago

A recommended reference from @adam-n-smith.

marcdotson commented 3 years ago

Let's get clear on the sources of data:

@cwjohnson1 and @z-wix, let's do some exploratory data analysis of the ownership and recontact survey. Use the model-validation branch and 02_exploratory-data-analysis.R script.

cwjohnson1 commented 3 years ago

I just created two pull requests for some of the code I've been working on. Let me know both your thoughts / questions as well as what the plans are in proceeding. Thanks!

cwjohnson1 commented 3 years ago

I know we mentioned wanting to compare how participants said they would purchase vs. how they actually purchased. I can do some analyses on that next. Are there any other thoughts?

marcdotson commented 3 years ago

@cwjohnson1 no need to create a separate branch -- please just work in model-validation. I've merged your changes back into this branch. I'm digging into the changes now and will provide updates in our weekly meeting.

marcdotson commented 3 years ago

Here's a sketch of how to use the ownership and recontact survey data as a validation task.

  1. Combine the recontact survey data and ownership data to produce a validation choice composed of car brand and year.
  2. Complete a validation task by appending an outside option so we can get a hit rate based on predicting the choice or the outside option.
  3. For the subset of total initial respondents for whom we have this validation task, use their betas (or draw their betas, if they are a hold-out respondent) for brand and year to compute predictive fit.
  4. This will result in two sets of validation predictive fit metrics — predictive fit for “in-sample" respondents and predictive fit for hold-out respondents.

There are a lot of things to figure out in here in terms of matching respondents to their in-sample and hold-out data, recoding open-ends and checking for spelling mistakes, and conditioning just on the brand and year attributes.

cwjohnson1 commented 3 years ago

I just realized that I committed the plots, but never pushed them. Sorry about that. you should be able to find them on the Sawtooth-2021.Rmd now.

marcdotson commented 3 years ago

@cwjohnson1 please don't create new branches. You can add this all to model-validation.

marcdotson commented 3 years ago

Notes on computing predictive fit using the validation task:

cwjohnson1 commented 3 years ago

I just uploaded 2 new plots to the Sawtooth-2021.RMD and am working on some more. I know Zach was working with the recontact data, but since he's working on another project now, I can also plot some visualizations for those data as well if you'd like.

marcdotson commented 3 years ago

Please do, @cwjohnson1.

marcdotson commented 3 years ago

Questions about constructing a validation task from recontact/ownership data:

marcdotson commented 3 years ago

Short-term options:

Long-term options:

cwjohnson1 commented 3 years ago

I just added the code for the ownership data visualizations, like we talked about, to the presentation folder under the model validation branch. The code for the recontact visualizations are found in 02_exploratory-data-analysis.R. Would you like me to add that code as well for the sake of finding it easier?

marcdotson commented 3 years ago

No, that's fine. Thanks!