vanderbilt-data-science / ancient-artifacts

Dynamic image analysis to identify ancient artifacts in soil samples. We will work on microdebitage (the debris of ancient stone knapping first) and later expand to other materials (e.g., mortars).
MIT License
6 stars 2 forks source link

Create inference notebook #59

Closed umangchaudhry closed 3 years ago

umangchaudhry commented 3 years ago

Load in df, make predictions which will output a score

markus-eberl commented 3 years ago
  1. Cutoff-point – At this stage, we still have to decide where the cutoff point is. We will need to score each particle and then cross-validate some of them under a microscope. As soon as we have defined the cutoff point, it would be great to indicate for each sample how many microdebitage particles it contains.

  2. Model comparison – It would also be useful to compare the scores of different models. Do they overlap for the same particles? For which particles do they differ?

  3. Comparison of relevant variables – Which variables contribute most to the score in each model? Most of them emphasize transparency but they seem to differ in less-important variables. I also wonder about related variables (e.g., feret length vs fiber length) that should be of similar importance but differ in some models. At last, some variables are secondary (e.g., length-width ratio) because they are calculated from primary variables (e..g, length and width). Should they be treated differently or even be excluded from the models?

csbell-vu commented 3 years ago
  1. Load models and relevant prediction data (from Box - this information will probably be stored as a .RData object)
  2. Load relevant prediction data from Box
  3. We may need to run assertions on the prediction data from Box (if the data violates assertions, the predictions may not be accurate)
  4. Predict using all loading models
  5. Identify particles which require review
    • Start with identifying mismatched predictions via hard threshold (e.g., 0.5)
    • Start with identifying regions of questionable prediction (e.g., 0.49-0.51)
    • TBD Identify mismatched predictions based on model-specific determined threshold
  6. Output the percentage of microdebitage estimated in the sample