Create inference notebook

umangchaudhry commented 3 years ago

Load in df, make predictions which will output a score

markus-eberl commented 3 years ago

Cutoff-point – At this stage, we still have to decide where the cutoff point is. We will need to score each particle and then cross-validate some of them under a microscope. As soon as we have defined the cutoff point, it would be great to indicate for each sample how many microdebitage particles it contains.
Model comparison – It would also be useful to compare the scores of different models. Do they overlap for the same particles? For which particles do they differ?
Comparison of relevant variables – Which variables contribute most to the score in each model? Most of them emphasize transparency but they seem to differ in less-important variables. I also wonder about related variables (e.g., feret length vs fiber length) that should be of similar importance but differ in some models. At last, some variables are secondary (e.g., length-width ratio) because they are calculated from primary variables (e..g, length and width). Should they be treated differently or even be excluded from the models?

csbell-vu commented 3 years ago

Load models and relevant prediction data (from Box - this information will probably be stored as a .RData object)
Load relevant prediction data from Box
We may need to run assertions on the prediction data from Box (if the data violates assertions, the predictions may not be accurate)
Predict using all loading models
Identify particles which require review
- Start with identifying mismatched predictions via hard threshold (e.g., 0.5)
- Start with identifying regions of questionable prediction (e.g., 0.49-0.51)
- TBD Identify mismatched predictions based on model-specific determined threshold
Output the percentage of microdebitage estimated in the sample

vanderbilt-data-science / ancient-artifacts